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SECRETED PROTEINS AND NUCLEIC ACIDS ENCODING THEM 

Related Application Information 
5 This application is a continuation-in-part of 

application serial number 09/164,169, filed October 2, 
1998, which is a continuation-in-part of application 
serial number 09/164,220, filed September 30, 1998. 

Background of the Invention 

10 Many secreted proteins, for example, cytokines and 
cytokine receptors, play a vital role in the regulation 
of cell growth, cell differentiation, and a variety of 
specific cellular responses. A number of medically 
useful proteins, including erythropoietin, granulocyte- 

15 macrophage colony stimulating factor, human growth 

hormone, and various interleukins, are secreted proteins. 
Thus, an important goal in the design and development of 
new therapies is the identification and characterization 
of secreted and transmembrane proteins and the genes 

20 which encode them. 

Many secreted proteins are receptors which bind a 
ligand and transduce an intracellular signal, leading to 
a variety of cellular responses. The identification and 
characterization of such a receptor enables one to 

25 identify both the ligands which bind to the receptor and 
the intracellular molecules and signal transduction 
pathways associated with the receptor, permitting one to 
identify or design modulators of receptor activity, e.g., 
receptor agonists or antagonists and modulators of signal 

30 transduction. 



Summary of the Invention 
The present invention is based, at least in part, on 
the discovery of cDNA molecules encoding TANGO 180, TANGO 
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181, TANGO 182, TANGO 183, TANGO 184, TANGO 185, TANGO 
186, TANGO 187, TANGO 188, TANGO 189, and TANGO 215, all 
of which are predicted to be either wholly secreted or 
transmembrane proteins. These proteins, fragments, 
5 derivatives, and variants thereof are collectively 
referred to as "polypeptides of the invention" or 
"proteins of the invention." Nucleic acid molecules 
encoding polypeptides of the invention are collectively 
referred to as "nucleic acids of the invention." 

10 The nucleic acids and polypeptides of the present 

invention are useful as modulating agents in regulating a 
variety of cellular processes. Accordingly, in one 
aspect, the present invention provides isolated nucleic 
acid molecules encoding a polypeptide of the invention or 

15 a biologically active portion thereof. The present 

invention also provides nucleic acid molecules which are 
suitable as primers or hybridization probes for the 
detection of nucleic acids encoding a polypeptide of the 
invention. 

20 The invention features nucleic acid molecules which are 

at least 45% (or 55%, 65%, 75%, 85%, 95%, or 98%) 

.* 

identical to the nucleotide sequence of any of SEQ ID 

Nos:l-22, 34-43 and - or the nucleotide sequence 

of the cDNA of a clone deposited with ATCC as any of 

25 Accession Numbers 98899, 98900 and 98901 (the "cDNA of a 
clone deposited as any of ATCC 98899, 98900, and 
989001") V or a complement thereof. 

The invention features nucleic acid molecules which 
include a fragment of at least 300 (325, 350, 375, 400, 

30 425, 450, 500, 550, 600, 650, 700, 800, 900, 1000, or 
1200) nucleotides of the nucleotide sequence of any of 

SEQ ID Nos:l-22, 34-43 and - or the nucleotide 

sequence of the cDNA of a clone deposited as any of ATCC 
98899, 98900, and 989001, or a complement thereof. 
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The invention also features nucleic acid molecules 
which include a nucleotide sequence encoding a protein 
having an amino acid sequence that is at least 45% (or 
55%, 65%, 75%, 85%, 95%, or 98%) identical to the amino 
5 acid sequence of any of SEQ ID Nos: 23-33, 54-63, and - 

or the amino acid sequence encoded by the cDNA of a 
clone deposited as any of ATCC 98899, 98900, and 989001, 
or a complement thereof. 

In preferred embodiments, the nucleic acid molecules 
10 have the nucleotide sequence of any of SEC ID NOs:l-22, 

34-43 and - or the nucleotide sequence of the cDNA 

of a clone deposited as any of ATCC 98899, 98900, and 
989001. 

Also within the invention are nucleic acid molecules 

15 which encode a fragment of a polypeptide' having the amino 

acid sequence of amy of SEQ ID Nos: 23-33, 54-63, and 

- the fragment including at least 15 (25, 30, 50, 

100, 150, 300, or 400) contiguous amino acids of any of 
SEQ ID Nos: 23-33, 54-63, and - or the polypeptide 

20 encoded by the cDNA of a clone deposited as any of ATCC 
98899, 98900, and 989001. 

The invention includes nucleic acid molecules which 
encode a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

25 SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA of a clone deposited as any 
of ATCC 98899, 98900, and 989001, wherein. the nucleic 
acid molecule hybridizes under stringent conditions to a 
nucleic acid molecule having a nucleic acid sequence 

30 encoding any of SEQ ID NOs: 22-33, 54-63, and - , 

or a complement thereof. 

Also within the invention are: isolated polypeptides or 
proteins having an amino acid sequence tljat is at least 
about 65%, preferably 75%, 85%, 95%, or 98% identical to 
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the amino acid sequence of any of SEQ ID NOs: 22-33, 54- 

63, and - . 

Also within the invention are: isolated polypeptides or 
proteins which are encoded by a nucleic acid molecule 
5 having a nucleotide sequence that is at least about 65%, 
preferably 75%, 85% , or 95% identical the nucleic acid 

sequence encoding any of SEQ ID Nos:22-33, 54-63, and 

- and isolated polypeptides or proteins which are 
encoded by a nucleic acid molecule having a nucleotide 

10 sequence which hybridizes under stringent hybridization 
conditions to a nucleic acid molecule having the sequence 

of any of SEQ ID NOs:l-22, 34-43, and - , and a 

complement thereof or the non-coding strand of the cDNA 
of a clone deposited as any of ATCC 98899, 98900, and 

15 989001. 

Also within the invention are polypeptides which are 
naturally occurring allelic variants of a polypeptide 
that includes the amino acid sequence of any of SEQ ID 
NOs: 22-33, 54-63, and - or an amino acid sequence 

20 encoded by the cDNA of a clone deposited as any of ATCC 
98899, 98900, and 989001, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes under 
stringent conditions to a nucleic acid molecule having 
the sequence of any of SEQ ID NOs: 1-22, 34-43, and - 

25 or a complement thereof. 

The invention also features nucleic acid molecules that 
hybridize under stringent conditions to $ nucleic acid 
molecule comprising the nucleotide sequence of any of SEQ 
ID NOs: 1-22, 34-43, and - , of the cDNA of a clone 

30 deposited as any of ATCC 98899, 98900, and 989001, or a 
complement thereof. In other embodiments, the nucleic 
acid molecules are at least 300 (325, 350, 375, 400, 425, 
450, 500, 550, 600, 650, 700, 800, 900, 1000, or 1290) 
nucleotides in length and hybridize under stringent 

35 conditions to a nucleic acid molecule cofnprising the 
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nucleotide sequence of any of SEQ ID NOs:l-22, 34-43 , and 

- of the cDNA of a clone deposited as any of ATCC 

98899, 98900, and 989001, or a complement thereof. In 
preferred embodiments, the isolated nucleic acid 
5 molecules encode a cytoplasmic, transmembrane, or 

extracellular domain of a polypeptide of the invention. 
In another embodiment, the invention provides an isolated 
nucleic acid molecule which is antisense to the coding 
strand of a nucleic acid of the invention. 

10 Another aspect of the invention provides vectors, e.g., 
recombinant expression vectors, comprising a nucleic acid 
molecule of the invention. In another embodiment the 
invention provides host cells containing such a vector. 
The invention also provides methods for producing a 

15 polypeptide of the invention by culturing, in a suitable 
medium, a host cell of the invention containing a 
recombinant expression vector encoding a polypeptide of 
the invention such that the polypeptide of the invention 
is produced. 

20 Another aspect of this invention features isolated or 
recombinant proteins and polypeptides of .the invention. 
Preferred proteins and polypeptides possess at least one 
biological activity possessed by the corresponding 
naturally- occurring human polypeptide. An activity, a 

25 biological activity, and a functional activity of a 
polypeptide of the invention refers to an activity 
exerted by a protein or polypeptide of the invention on a 
responsive cell as determined in vivo, or in vitro, 
according to standard techniques. Such activities can be 

30 a direct activity, such as an association with or an 
enzymatic activity on a second protein qt; an indirect 
activity, such as a cellular signaling activity mediated 
by interaction of the protein with a second protein. 
Thus, such activities include, e.g., (1) the ability to 

35 form protein-protein interactions with proteins in the 
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signaling pathway of the naturally-occurring 
polypeptide; (2) the ability to bind a ligand of the 
naturally-occurring polypeptide; (3) the ability to bind 
to an intracellular target of the naturally- occurring 
5 polypeptide* Other activities include: (1) the ability 
to modulate cellular proliferation; (2) the ability to 
modulate cellular differentiation; and (3). the ability to 
modulate cell death. 

In one embodiment, a polypeptide of the invention has 

10 an amino acid sequence sufficiently identical to an 

identified domain of a polypeptide of the invention. As 
used herein, the term "sufficiently identical" refers to 
a first amino acid or nucleotide sequence which contains 
a sufficient or minimum number of identical or equivalent 

15 (e.g., with a similar side chain) amino acid residues or 
nucleotides to a second amino acid or nucleotide sequence 
such that the first and second amino acid or nucleotide 
sequences have a common structural domain and/or common 
functional activity. For example, amino acid or 

20 nucleotide sequences which contain a common structural 
domain having about 65% identity, preferably 75% 
identity, more preferably 85%, 95%, or 98% identity are 
defined herein as sufficiently identical. 

In one embodiment, the isolated polypeptide of the 

25 invention lacks both a transmembrane and a cytoplasmic 
domain. In smother embodiment, the polypeptide lacks 
both a transmembrane domain and a cytoplasmic domain and 
is soluble under physiological conditions. 

The polypeptides of the present invention, or 

30 biologically active portions thereof, can be operably 
linked to a heterologous amino acid sequence to form 
fusion proteins. The invention further features 
antibodies that specifically bind a polypeptide of the 
invention such as monoclonal or polyclonal antibodies. 

35 In addition, the polypeptides of the invention or 
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biologically active portions thereof can be incorporated 
into pharmaceutical compositions, which optionally 
include pharmaceutical ly acceptable carriers. 

In another aspect, the present invention provides 
5 methods for detecting the presence of the activity or 
expression of a polypeptide of the invention in a 
biological sample by contacting the biological sample 
with an agent capable of detecting an indicator of 
activity such that the presence of activity is detected 

10 in the biological sample. 

In another aspect, the invention provides methods for 
modulating activity of a polypeptide of the invention 
comprising contacting a cell with an agent that modulates 
(inhibits or stimulates) the activity or expression of a 

15 polypeptide of the invention such that activity or 

expression in the cell is modulated. In qne embodiment, 
the agent is an antibody that specifically binds to a 
polypeptide of the invention. 

In another embodiment, the agent modulates expression 

20 of a polypeptide of the invention by modulating 
transcription, splicing, or translation of an mRNA 
encoding a polypeptide of the invention. In yet another 
embodiment, the agent is a nucleic acid molecule having a 
nucleotide sequence that is antisense to the coding 

25 strand of an mRNA encoding a polypeptide of the 
invention. 

The present invention also provides methods to treat a 
subject having a disorder characterized £y aberrant 
activity of a polypeptide of the invention or aberrant 

30 expression of a nucleic acid of the invention by 
administering an agent which is a modulator of the 
activity of a polypeptide of the invention or a modulator 
of the expression of a nucleic acid of the invention to 
the subject. In one embodiment, the modulator is a 

35 protein of the invention. In another embodiment, the 
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modulator is a nucleic acid of the invention. In other 
embodiments , the modulator is a peptide, peptidomimetic, 
or other small molecule. 

The present invention also provides diagnostic assays 
5 for identifying the presence or absence of a genetic 

lesion or mutation characterized by at least one of: (i) 
aberrant modification or mutation of a gene encoding a 
polypeptide of the invention, (ii) mis-regulation of a 
gene encoding a polypeptide of the invention, and (iii) 

10 aberrant post-translational modification of a polypeptide 
of the invention wherein a wild-type form of the gene 
encodes a polypeptide having the activity of the 
polypeptide of the invention. 

In another aspect, the invention provides a method for 

15 identifying a compound that binds to or modulates the 
activity of a polypeptide of the invention. In general, 
such methods entail measuring a biological activity of 
the polypeptide in the presence and absence of a test 
compound and identifying those compounds which alter the 

20 activity of the polypeptide. 

The invention also features methods for identifying a 
compound which modulates the expression of a polypeptide 
or nucleic acid of the invention by measuring the 
expression of the polypeptide or nucleic acid in the 

25 presence and absence of the compound. 

Other features and advantages of the invention will be 
apparent from the following detailed description and 
claims . 



Brief Description of the Drawings 
30 Figure 1 depicts the cDNA sequence (SEQ ID NO:l) and 
predicted amino acid sequence (SEQ ID NO: 23) of human 
TANGO 180. 
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Figure 2 depicts the cDNA sequence (SEQ ID NO: 34) and 
predicted amino acid sequence (SEQ ID NO: 54) of murine 
TANGO 180. 

Figure 3 depicts the cDNA sequence (SEQ ID NO: 2) and 
5 predicted amino acid sequence (SEQ ID NO:24) of human 
TANGO 181. 

Figure 4 depicts the partial cDNA sequence (SEQ ID 
NO:35; partial) and predicted amino acid" sequence (SEQ ID 
NO:55; partial) of murine TANGO 181. 
10 Figure 5 depicts the cDNA sequence (SEQ ID NO: 3) and 
predicted amino acid sequence (SEQ ID NO: 25) of human 
TANGO 182. 

Figure 6 depicts the partial cDNA sequence (SEQ ID 
NO: 36; partial) and predicted amino acid sequence (SEQ ID 
15 NO: 56; partial) of murine TANGO 182. 

Figure 7 depicts the cDNA sequence (SEQ ID NO: 4) and 
predicted amino acid sequence (SEQ ID NO: 26) of human 
TANGO 183. 

Figure 8 depicts the cDNA sequence (SEQ ID NO: 37) and 
20 predicted amino acid sequence (SEQ ID NO-: 57) of murine 
TANGO 183. 

Figure 9 depicts the cDNA sequence (SEQ ID NO: 5) and 
predicted amino acid sequence (SEQ ID NO: 27) of human 
TANGO 184. 

25 Figure 10 depicts the cDNA sequence (SEQ ID NO: 38) and 
predicted amino acid sequence (SEQ ID NO: 58) of murine 
TANGO 184. 

Figure 11 depicts the cDNA sequence (SEQ ID NO: 6) and 
predicted amino acid sequence (SEQ ID NO: 28) of human 
30 TANGO 185. 

Figure 12 depicts the cDNA sequence (SEQ ID NO: 39) and 
predicted amino acid sequence (SEQ ID NO: 59) of murine 
TANGO 185. 
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Figure 13 depicts the cDNA sequence (SEQ ID NO: 7) and 
predicted amino acid sequence (SEQ ID NO: 29) of human 
TANGO 186.^ 

Figure 14 depicts the cDNA sequence (SEQ ID NO: 40) and 
5 predicted amino acid sequence (SEQ ID NO: 60) of murine 
TANGO 186. 

Figure 15 depicts the cDNA sequence (SEQ ID NO: 8) and 
predicted amino acid sequence (SEQ ID NO: 30) of human 
TANGO 188. 

10 Figure 16 depicts the cDNA sequence (SEQ ID NO:41) and 
predicted amino acid sequence (SEQ ID NQ?61) of murine 
TANGO 188. 

Figure 17 depicts the cDNA sequence (SEQ ID NO: 9) and 
predicted amino acid sequence (SEQ ID NO: 31) of human 
15 TANGO 189. 

Figure 18 depicts the cDNA sequence (SEQ ID NO: 42) and 
predicted amino acid sequence (SEQ ID NO: 62) of murine 
TANGO 189. 

Figure 19 depicts the cDNA sequence (SEQ ID NO: 10) and 
20 predicted amino acid sequence (SEQ ID NO: 32) of human 
TANGO 215. 

Figure 20 depicts the cDNA sequence (SEQ ID NO: 11) and 
predicted amino sequence of human TANGO 187-1/3 (SEQ ID 
N0:22) . 

25 Figure 21 depicts the cDNA sequence (SEQ ID NO: 43; 
partial) and predicted amino acid sequence of murine 
TANGO 187 (SEQ ID NO:63; partial). 

Figure 22 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 23) and murine (SEQ ID 
30 NO:54) TANGO 180. 

Figure 23 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 24) and murine (SEQ ID 
NO: 55; partial) TANGO 181. 
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Figure 24 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 25) and murine (SEQ ID 
NO: 5; partial) TANGO 182, 

Figure 25 depicts an alignment of the predicted amino 
5 acid sequences of human (SEQ ID NO: 26) and murine (SEQ ID 
NO:57) TANGO 183. 

Figure 26 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 27) and murine (SEQ ID 
NO:58) TANGO 184. 
10 Figure 27 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 28) and murine (SEQ ID 
NO:59) TANGO 185. 

Figure 28 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 29) and murine (SEQ ID 
15 NO:60) TANGO 186. 

Figure 29 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 30) and murine (SEQ ID 
NO:61) TANGO 188. 

Figure 30 depicts an alignment of the predicted amino 
20 acid sequences of human (SEQ ID NO: 31) and murine (SEQ ID 
NO: 62) TANGO 189. 

Figure 31 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 33) and murine (SEQ ID 
NO: 63; partial) TANGO 187. 
25 Figure 32 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO:l) and murine (SEQ ID NO: 34) TANGO 180. 

Figure 33 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 2) and murine (SEQ ID NO: 35; partial) 
TANGO 181. 

30 Figure 34 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 3) and murine (SEQ ID NO: 36; partial) 
TANGO 182. 

Figure 35 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO:4) and murine (SEQ ID NO:37) TANGO 183. 
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Figure 36 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 5) and murine (SEQ ID NO: 38) TANGO 184. 

Figure 37 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO:6) and murine (SEQ ID NO:39) TANGO 185. 
5 Figure 38 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 7) and murine (SEQ ID NO:40) TANGO 186. 

Figure 39 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 8) and murine (SEQ ID N0:41) TANGO 188. 

Figure 40 depicts an alignment of the cDNA sequences of 
10 human (SEQ ID NO: 9) and murine (SEQ ID NO:42) TANGO 189. 

Figure 41 depicts am alignment of the cDNA sequences of 
human (SEQ ID NO: 11) and murine (SEQ ID NO:43; partial) 
TANGO 187. 

Figure 42 depicts an alignment of the amino acid 
15 sequences of human, TANGO 181 (SEQ ID NO:24) , murine TANGO 
181 (SEQ ID NO:55; partial), human TANGO 182 (SEQ ID 
NO:25), and murine TANGO 182 (SEQ ID NO:56; partial). 

Figure 43 depicts an alignment of the amino acid 
sequences of human TANGO 184 (SEQ ID NO: 27) and human 
20 TANGO 183 (SEQ ID NO:26) . 

Figure 44 depicts an alignment of the amino acid 
sequences of murine TANGO 184 (SEQ ID NO: 58) and murine 
TANGO 183 (SEQ ID NO:57) . 

Figure 45 depicts and alignment of the amino acid 
25 sequences of human TANGO 180 (SEQ ID NO: 23) , murine TANGO 
180 (SEQ ID NO:54), agkistrodon PLA2 (SQ ID NO:109), 
acanthahis PLA2 (SEQ ID NO: 110), and bovine PLA2 (SEQ ID 
, NO: 111) . 

Figure 46 depicts the cDNA sequence (SEQ ID NO: ) and 

30 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1. 

Figure 47 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NCt: ) of TANGO 

187-2/3. 
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Figure 48 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2/3. 

Figure 49 depicts the cDNA sequence (SEQ ID NO: ) and 

5 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2. 

Figure 50 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-2. 

10 Figure 51 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-3. 

Figure 52 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

15 187. 

Figure 53 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence- (SEQ ID NO: ) 

of murine TANGO 181. 

Figure 54 depicts a complete cDNA sequence (SEQ ID 

20 NO: ) and predicted amino acid sequence (SEQ ID NO: ) 

of murine TANGO 182. 

Figure 55 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence (SEQ ID NO: ) 

of murine TANGO 187. 
25 Figure 56 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence (SEQ ID NO: ) 

of murine TANGO 215. 

Detailed Description of the Invention 
The present invention is based on the discovery of cDNA 
30 molecules encoding TANGO 180, TANGO 181, . TANGO 182, TANGO 
183, TANGO 184, TANGO 185, TANGO 186, TANGO 188, TANGO 
189, TANGO 215, and TANGO 187, all of which are predicted 
to be either wholly secreted or transmembrane proteins. 
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TANGO 180 

The human TANGO 180 cDNA of SEQ ID NO:l has a 567 
nucleotide open reading frame (SEQ ID NO: 12) encoding a 
189 amino acid protein (SEQ ID NO: 23) . The cDNA and 
5 protein sequences of human TANGO 180 are shown in Figure 
1. 

Human TANGO 180 is predicted to be a wholly secreted 
protein having a 22 amino acid signal sequence (amino 
acids 1 - 22 of SEQ ID NO: 23; SEQ ID NO: 64) followed by a 

10 167 amino acid mature protein (amino acids 23 - 189 of 
SEQ ID NO:23; SEQ ID NO:76 ). TANGO 180 is predicted to 
have a molecular weight of 21.0 kDa prior to cleavage of 
its signal peptide and a molecular weight of 18.5 kDa 
subsequent to cleavage of its signal peptide. 

15 The murine TANGO 180 of SEQ ID NO: 34 has a 576 

nucleotide open reading frame (SEQ ID NO; 44) encoding a 
192 amino acid protein (SEQ ID NO: 54) . The cDNA and 
protein sequences of murine TANGO 180 are; shown in Figure 
2. 

20 Figure 22 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 23) and murine (SEQ 
ID NO:54) TANGO 180 (88.7% identity). Figure 32 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO:l) 
and murine (SEQ ID NO:34) TANGO 180 (55% identity) . 

25 Northern analysis of human TANGO 180 roRNA expression 
revealed the presence of two major transcripts (1.3 and 
5.25 kb) and three minor transcripts (0.95, 1.8, and 4.15 
kb) . This analysis also revealed that all five 
transcripts are expressed at a low level in placenta, 

30 lung, and liver; that the 1.3 and the 5.25 kb transcripts 
are expressed at a moderate level in brain and kidney; 
that the 5.25 kb transcript is expressed at a moderate 
level in heart, skeletal muscle, and pancreas; and that 
the 1.3 kb transcript is expressed at a high level in 

35 heart, skeletal muscle, and pancreas. 
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In situ expression analysis of TANGO 180 in adult 
murine tissue revealed no significant expression in 
bladder, pancreas, heart, thymus, kidney, brain, colon, 
placenta, eye, liver, spleen, lung, skeletal 
5 muscle/diaphram, or small intestine. In situ expression 
analysis of murine embryonic tissue revealed expression 
in the liver at E13.5 through E16.5. Liver expression 
was also observed, although at a lower level, at E17.5 
and PI. 5, 

10 TANGO 180 maps to human chromosome location 4q25. 
TANGO 180 is predicted to have a phospholipase A2 
histidine active site domain at amino acids 106-113 of 
SEQ ID NO: 23 and a phospholipase A2 aspartic acid active 
site-like domain at amino acids 124-131 of SEQ ID NO:23. 

15 An apparent genomic sequence of TANGO 180 appears at 
GenBank Accession Number AC004067. 

Human TANGO 180 bears some similarity to a number of C. 
elegana proteins. 

TANGO 180 bears some similarity to a number of known 

20 phospholipase A2 (PLA2) proteins (Lambeau et al. (1994) 
J. Biol. Chem. 269:1575-78; Lambeau et al. (1995) J. 
Biol. Chem. 270:5534-40). TANGO 180 may play a role 
similar to that of a phospholipase A2. Figure 45 
depicts and alignment of the amino acid sequences of 

25 human TANGO 180 (SEQ ID NO: 23) , murine TANGO 180 (SEQ ID 
NO: 54), agkistrodon PLA2 (SQ ID NO: 109), acanthahis PLA2 
(SEQ ID NO:110), and bovine PLA2 (SEQ ID NO: 111) . There 
are thought to be at least two important regions within 
many PLA2's: CCXXHCCX (hisitidine at active site) and 

30 LIVMACLIVMFYWPCSTCDXXXXXC (aspratic acid active site) . 
Various phospholipase A2 proteins are thought to be 
involved in inflammation. Moreover, it appears that the 
expression and synthesis of at least some phospholipase 
A2 proteins are induced by pro- inflammatory modulators 

35 such as interleukin-1, interleukin-6, and tumor necrosis 
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factor. Thus, TANGO 180 may be involved in inflammation, 
e.g., arthritis, endotoxic shock, peritonitis, psoriasis, 
acute pancreatitis, and respiratory distress syndrome. 
Accordingly, TANGO 180 nucleic acid molecules and 
5 polypeptides as well as anti-TANGO 180 antibodies and 
modulators of TANGO 180 expression or activity may be 
useful in the treatment of such disorders. Moreover, 
PLA2's have been implicates in digestion, airway 
contraction, smooth muslce contraction, fertilization, 

10 and cell proliferation. Thus, TANGO 180 nucleic acid 
molecules and polypeptides as well as anti-TANGO 180 
antibodies and modulators of TANGO 180 expression or 
activity may be useful in the treatment of disorders of 
digestion, airway contraction, smooth muslce contraction, 

15 fertilization, and cell proliferation. 

TANGO 181 

The human TANGO 181 cDNA of SEQ ID NO: 2 has a 1017 
nucleotide open reading frame (SEQ ID NO:,12) encoding a 
339 amino acid protein (SEQ ID NO: 23) . The cDNA and 
20 protein sequences of human TANGO 181 are shown in Figure 
3. 

Human TANGO 181 is predicted to be a secreted protein 
having a 22 amino acid signal sequence (amino acids 1 - 
22 of SEQ ID NO: 24; SEQ ID NO: 65) followed by a 317 amino 

25 acid mature protein (amino acids 23-339 of SEQ ID 

NO: 24; SEQ ID NO: 77) . TANGO 181 is predicted to have a 
molecular weight of 37.8 kDa prior to cleavage of its 
signal peptide and a molecular weight of .35.2 subsequent 
to cleavage of its signal peptide. 

30 The murine TANGO 181 partial cDNA of SEQ ID NO: 35 has a 
747 nucleotide open reading frame (SEQ ID NO: 45) encoding 
a 249 amino acid protein (SEQ ID NO:55) . The partial 
cDNA and protein sequences of murine TANGO 181 are shown 
in Figure 4. 
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Figure 23 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 24) and murine (SEQ 
ID NO:55; partial) TANGO 181 (72.1% identity). Figure 33 
depicts an alignment of the cDNA sequences of human (SEQ 
5 ID NO: 2) and murine (SEQ ID NO: 35; partial) TANGO 181 
(65.4% identity). The pair of cysteines at amino acids 
76 and 129 might be important for disulfide bond 
formation. The single cysteine at amino acid 262 might 
enable TANGO 181 to form homodimers (or heterodimers with 
10 TANGO 182) . 

The cDNA sequence (SEQ ID NO: ) and predicted amino 

acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 181 clone are shown in Figure 53. 

Northern analysis of human TANGO 181 mRNA expression 
15 revealed the presence of two transcripts (4.3 and 4.5 kb) 
expressed at a low level in heart, brain, placenta, lung, 
liver, skeletal muscle, kidney, and pancreas, with the 
level of expression in the pancreas being higher than in 
the other tissues. 

20 Murine in situ expression analysis revealed that TANGO J 

181 is weakly expressed in adult brain (choroid plexus 

and .olfactory bulb) . This analysis also revealed TANGO 

180 expression in the liver and kidney (medulla) . High 

level TANGO 180 expression was observed in testis. This 
25 analysis detected little or no expression of TANGO 181 in 

adult liver, ovary, heart, lung, spleen, fat, muscle, 

skin, stomach, duodenum, colon, pancreas, thymus, 

pituitary, and eye. In situ expression analysis of 

embryos revealed that TANGO 181 is ubiquitously expressed 
30 at stages E12.5, E13.5, and E14.5. 

TANGO 181 maps to human chromosome location 8pl2 . WI- 

5768 and AFMB057WG5 are markers which flank TANGO 181. 

Nearby loci include WRN (Werner Syndrome) and SPG5A 

(Spastic Paraplegia 5A) , and nearby known genes include 
35 FGFR1 (fibroblast growth factor receptor) , STAR 
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(Steroidogenic acute regulatory protein) , ANK1 (abkyrin 
1) , CALB1 (calbindin 1) , CHRNB3 (cholinergic receptor, 
nicotinic) . The human chromosomal location corresponds 
to a position on mouse chromosome 8 near fgfri 
5 (fibroblast growth factor receptor) , cyrn (cyritesin 1) , 
tissue plasminogen activator, and ank (ankyrin 1) . 

Within the 3' untranslated region of the human TANGO 
181 cDNA described above is a 260 base pair sequence 
(Genbank Accession Number Z36802) previously identified 

10 as part of a gene that appears to be preferentially 

expressed in pancreatic cancer and chronic pancreatitis 
(Gress et al. (1996) Oncogene 13:1819-30). Thus, TANGO 
181 nucleic acids and polypeptides may be useful for the 
diagnosis and/or treatment of chronic pancreatitis and 

15 pancreatic cancer (as well as other cancers) . In 

addition, modulators of TANGO 181 expression or activity 
may be useful in the treatment of such disorders. 

TANGO 181 and TANGO 182 are highly homologous to teh C. 
elegans protein C42C1.9 

20 TANGO 182 

The human TANGO 182 cDNA of SEQ ID NO: 3 has a 1044 
nucleotide open reading frame (SEQ ID NO: 14) encoding a 
348 amino acid protein (SEQ ID NO: 25) . The cDNA and 
protein sequences of human TANGO 182 are shown in Figure 
25 5. 

Human TANGO 182 is predicted to be a secreted protein 
having a 23 amino acid signal sequence (amino acids 1 - 
23 of SEQ ID NO:25; SEQ ID N0:66) followed by a 325 amino 
acid mature protein (amino acids 24 - 348 of SEQ ID 
30 NO:25; SEQ ID N0:78) . TANGO 182 is predicted to have a 
molecular weight of 39.2 kDa prior to cleavage of its 
signal peptide and a molecular weight of 36.1 kDa 
subsequent to cleavage of itB signal peptide. 
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The murine TANGO 182 partial cDNA of SEQ ID NO: 36 has 
an 825 nucleotide open reading frame (SEQ ID NO: 46) 
encoding a 275 amino acid protein (SEQ ID NO: 56) . The 
partial cDNA and protein sequences of murine TANGO 182 
5 are shown in Figure 6. Figure 24 depicts an alignment 

of the predicted amino acids sequences of human (SEQ ID 
NO:25) and murine (SEQ ID NO:56; partial) TANGO 182 
(75.1% identity) . Figure 34 depicts an alignment of the 
cDNA sequences of human (SEQ ID NO: 3) and murine (SEQ ID 

10 NO:36; partial) TANGO 182 (67.6% identity). The pair of 
cysteines at amino acids 78 and 130 might be important 
for disulfide bond formation. The single cysteine at 
amino acid 312 might enable TANGO 182 to form homodimers 
(or heterodimers with TANGO 181) . 

15 The cDNA sequence (SEQ ID NO: ) and predicted amino 

acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 182 clone are shown in Figure 54 . 

TANGO 182 maps to human chromosomal location 10q24 
between markers D10S566 and D10S540. In mice, TANGO 182 
20 maps to chromosome 10 bwtween D10S198 arid D10S192 (129.8 
to 131.2 cM) • 

Northern analysis of human TANGO 182 mRNA expression 
revealed the presence of a 2.8 kb transcript that is 
expressed at a high level placenta and a somewhat lower 

25 level in liver, kidney, and pancreas. This transcript is 
expressed at a low level in heart, brain, lung, and 
skeletal muscle. 

Murine in situ expression analysis revealed that TANGO 
182 is expressed at a high level in testis in adult mice. 

30 Little or no expression was detected in adult brain, 
liver, kidney, ovary, heart, lung, spleen, fat, muscle, 
skin, stomach, duodenum, colon, pancreas, thymus, 
pituitary, or eye by in situ analysis. In situ 
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expression analysis of embryos revealed ubiquitous, low 
level expression at stages E12.5, E13.5, and E14.5. 

Both human and mouse TANGO 182 are quite similar to 
human and murine TANGO 181 at the amino acid level 
5 (Figure 42). Thus, TANGO 182, like TANGO 181, may be 
useful for the diagnosis and/or treatment of pancreatic 
cancer and chronic pancreatitis as well as other cancers. 
In addition, TANGO 182 bears some similarity to a C. 
elegana protein C42C1.9 (Genbank Accession Number 

10 AF043695) that is encoded by a gene that is present in 
the same operon as a gene encoding a mitochondrial 
carrier protein. Since genes within the same operon are 
often co-regulated and encode proteins involved in the 
same physiological state, TANGO 182 may play a role in 

15 metabolism. Thus, TANGO 182 nucleic acids and 

polypeptides as well as antibodies directed against TANGO 
182 may be useful in the diagnosis and treatment of 
metabolic disorders. In addition, modulators of TANGO 
182 expression or activity may be useful in the treatment 

20 of such disorders. 



TANGO 183 

The human TANGO 183 cDNA of SEQ ID NO: 4 has a 549 
nucleotide open reading frame (SEQ ID NO: 15) encoding a 
183 amino acid protein (SEQ ID NO: 26) . The cDNA and 
25 protein sequences of human TANGO 183 are shown in Figure 
7. 

Human TANGO 183 is predicted to be a transmembrane 
protein having a 20 amino acid signal sequence (amino 
acids 1 - 20 of SEQ ID NO:26; SEQ ID NO: 67) followed by a 
30 163 amino acid mature protein (amino acids 21 - 183 of 
SEQ ID NO: 26; SEQ ID NO: 79 ) having a 69 amino acid 
extracellular domain (amino acids 21 - 89 of SEQ ID 
NO: 26; SEQ ID NO: 88) , a 23 amino acid transmembrane 
domain (amino acids 90 - 112 of SEQ ID NO: 26; SEQ ID 
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NO: 94), and a 71 amino acid cytoplasmic domain (amino 
acids 113 - 183 of SEQ ID NO 26; SEQ ID NO: 102) . There 
are 8 conserved cysteines in the extracellular domain. 
TANGO 183 has a high porportion of charged amino acids in 
5 the predicted extracellular (18%, not including 

histidines) and cytoplasmic (32%) domains. Human TANGO 
183 is predicted to have a molecular weight of 20.6 kDa 
prior to cleavage of its signal peptide and a molecular 
weight of 18.1 kDa subsequent to cleavage of its signal 
10 peptide. 

The murine TANGO 183 cDNA of SEQ ID NO:37 has a 549 
nucleotide open reading frame (SEQ ID NO: 47) encoding a 
183 amino acid protein (SEQ ID NO: 57) . The cDNA and 
protein sequences of murine TANGO 183 are shown in Figure 
15 8. 

Figure 25 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 26) and murine (SEQ 
ID NO: 57) TANGO 183 (97.3% identity). Figure 35 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 4) 
20 and murine (SEQ ID NO:37) TANGO 183 (71.7% identity). 
The conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants . 

Northern analysis of human TANGO 183 mRNA expression 
25 revealed the presence of a 1.6 kb transcript that is 

expressed at a high level in brain, kidney, pancreas, and 
heart; at a moderate level in liver and skeletal muscle, 
and at a low level in placenta and lung. 

The nucleic acid sequence of TANGO 183 is related to a 
30 sequence tagged site at chromosomal location llpl5.4, and 
TANGO may map to this site. 

The predicted cytoplasmic domain of TANGO 183 has a 
relatively high number of charged residues (32%) . This 
suggests that TANGO 183 may non-covalently, e.g., 
35 electrostatically, associate with an intracellular 
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molecule such as a cytoskeletal component. Accordingly, 
TANGO 183 may itself be involved in maintaining the 
structural integrity of cells in which it is expressed. 
If so, aberrant TANGO 183 protein or aberrantly regulated 
5 TANGO 183 could be involved in alterations in cellular 
morphology, e.g., alterations associated with metastasis. 
Accordingly, TANGO 183 nucleic acid molecules and 
polypeptides as well as ant i -TANGO 183 antibodies and 
modulators of TANGO 183 expression or activity may be 

10 useful in the treatment of disorders associated with 

aberrant cell development or cell differentiation, e.g., 
cancer, or cell migration, e.g., tumor metastasis. 

TANGO 183 and TANGO 184 are related and may play 
similar functional roles. Figure 43 depicts an alignment 

15 of the amino acid sequences of human TANGO 184 (SEQ ID 
NO:27) and human TANGO 183 (SEQ ID N0:26) . Figure 44 
depicts an alignment of the amino acid sequences of 
murine TANGO 184 (SEQ ID NO: 58) and murine TANGO 183 (SEQ 
ID NO:57) . 

20 TANGO 183 is related to C. elegans R12C12.6 (GenBank 
Accession NO. U23510) . 

TANGO 184 

The human TANGO 184 cDNA of SEQ ID NO: 5 has a 594 
nucleotide open reading frame (SEQ ID NO: 16) encoding a 
25 198 amino acid protein (SEQ ID NO: 27) . The cDNA and 

protein sequences of human TANGO 184 are shown in Figure 
9. 

Human TANGO 184 is predicted to be a transmembrane 
protein having a 28 amino acid signal sequence (amino 
30 acids 1 - 28 of SEQ ID NO: 27; SEQ ID NO: 68) followed by a 
170 amino acid mature protein (amino acids 29 - 198 of 
SEQ ID NO: 27; SEQ ID NO: 80) having a 74 amino acid 
extracellular domain (amino acids 29 - 102 of SEQ ID NO: 
27; SEQ ID NO: 89) , a 23 amino acid transmembrane domain 
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(amino acids 103 - 125 Of SEQ ID NO:27; SEQ ID NO:95), 
and a 73 amino acid cytoplasmic domain (amino acids 126 - 

198 of SEQ ID NO 27; SEQ ID NO: 103) . TANGO 184 has a 
high porportion of charged amino acids in the predicted 

5 extracellular (31%) and cytoplasmic (29%) domains. 
Notably, the transmembrane regions include charged 
residues. Human TANGO 184 is predicted to have a 
molecular weight of 22.5 kDa prior to cleavage of its 
signal peptide and a molecular weight of 18.9 kDa 
10 subsequent to cleavage of its signal peptide. 

The murine TANGO 184 cDNA of SEQ ID NO: 38 has a 357 
nucleotide open reading frame (SEQ ID NO: 48) encoding a 

199 amino acid protein (SEQ ID NO: 58) . The cDNA and 
protein sequences of murine TANGO 184 are shown in Figure 

15 10. 

Figure 26 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 27) and murine (SEQ 
ID NO:58) TANGO 184 (94.5% identity). Figure 36 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 5) 

20 and murine (SEQ ID NO:38) TANGO 184 (63.8% identity). 

Northern analysis of human TANGO 184 mRNA expression 
revealed the presence of a 2 kb transcript that is 
expressed at a high level in heart brain, placenta, 
skeletal muscle, kidney, and pancreas; and at a low level 

25 in lung and liver. There are two alternative polyA 
sites: nucleotide 1000 and nucleotide 2000. 

In situ analysis of TANGO 184 expression in adult mice 
revel expression in the brain (moderate, ubiquitous 
expression) , spinal cord (weak expression in the region 

30 of the grey matter) submandibular gland (strong, 

ubiquitous expression) , stomach (weak expression in the 
muscle region) , Kidney (weak, ubiquitous expression in 
the cortex and medulla, stronger expression in papilla) , 
adrenal gland (weak ubiquitous expression) , thymus (weak 

35 expression in cortex) , lymph node (moderate ubiquitous 
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expression) spleen (weak expression in follicles) , 
skeletal muscle/smooth muscle (diaphragm) , testis (strong 
expression in the area surrounding the seminiferous 
tubules) , ovaries (weak expression) placenta (moderate, 
5 ubiquitous expression) . This analysis did not reveal 
significant expression in white fat, brown fat, heart, 
lung, liver, pancreas, colon, small intestine, and 
bladder. In embryonic tissue, this analysis revealed 
expression at E13.5 (weak to moderate ubiquitous 

10 expression with higher expression in the brain and 

liver), B14.5 (weak to moderate ubiquitous expression 
with higher expression in the brain and liver), E15.5 
(moderate ubiquitous expression with higer expression in 
the brain), E16.5 (weak to moderate ubiquitous expression 

15 with higher expression in the brain, spinal cord, brown 
fat, submandibular gland, lung, stomach, and intestines), 
E18.5 (weak to moderate ubiquitous expression with higher 
expression in the brain, spinal cord, brown fat, 
submandibular gland, lung, stomach, and intestines) , and 

20 PI. 5 (weak ubiquitous expression with higer expression in 
brain, submandibular gland, olfactory epithelium, and 
stomach) . 

The predicted cytoplasmic domain of TANGO 184 has a 
relatively high number of charged residues (29%) . This 

25 suggests that TANGO 184 may non-covalently, e.g., 
electrostatically, associate with an intracellular 
molecule such as a cytoskeletal component. Accordingly, 
TANGO 184 may itself be involved in maintaining the 
structural integrity of cells in which it is expressed. 

30 If so, aberrant TANGO 184 protein or aberrantly regulated 
TANGO 184 could be involved in alterations in cellular 
morphology, e.g., alterations associated with metastasis. 
Accordingly, TANGO 184 nucleic acid molecules and 
polypeptides as well as ant i -TANGO 184 antibodies and 

35 modulators of TANGO 184 expression or activity may be 
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useful in the treatment of disorders associated with 
aberrant cell development or cell differentiation, e.g., 
cancer, or cell migration, e.g., tumor metastasis. 

TANGO 185 

5 The human TANGO 185 cDNA of SEQ ID NO: 6 has a 579 
nucleotide open reading frame (SEQ ID NO: 17) encoding a 
193 amino acid protein (SEQ ID NO:28) . The cDNA and 
protein sequences of human TANGO 185 are shown in Figure 
11. 

10 Human TANGO 185 is predicted to be a transmembrane 
protein having a 24 amino acid signal sequence (amino 
acids 1 - 24 of SEQ ID NO: 28; SEQ ID NO: 69) followed by a 
169 amino acid mature protein (amino acids 25 - 193 of 
SEQ ID NO: 28; SEQ ID NO: 81) having two extracellular 

15 domains, one having 51 amino acids (amino acids 25 - 75 
of SEQ ID NO:28; SEQ ID NO: 90), and a second having 19 
amino acids (amino acids 132 - 150 of SEQ ID NO: 28; SEQ 
ID NO: 91); three transmembrane domains, one having 27 
amino acids (amino acids 76 - 102 of SEQ ID NO: 28; SEQ ID 

20 NO: 96), a second having 22 amino acids (amino acids 110- 
131 of SEQ ID NO:28; SEQ ID NO:97), the third having 24 
amino acids (amino acids 151 - 174 of SEQ ID NO: 28; SEQ 
ID NO: 98); and two cytoplasmic domains, one having 7 
amino acids (amino acids 103 - 109 of SEQ ID NO: 28; SEQ 

25 ID NO: 104), and a second having 19 amino acids (amino 
acids 175 - 193 of SEQ ID NO: 28; SEQ ID NO: 105) . The 
predicted 22 amino acid transmembrane domain and the 
predicted 24 amino acid domain, along with the predicted 
7 amino acid cytoplasmic domain may form one hydrophobic 

30 domain that passes through the membrane twice. TANGO 185 
is predicted to have a molecular weight of 21.4 kDa prior 
to cleavage of its signal peptide and a molecular weight 
of 18.8 kDa subsequent to cleavage of its signal peptide. 
Notably, the transmembrane regions have charged residues. 



WO 00/18904 



PCT/US99/22817 



- 26 - 

The murine TANGO 185 cDNA of SEQ ID NO: 39 has a 579 
nucleotide open reading frame (SEQ ID NO: 49) encoding a 
193 amino acid protein (SEQ ID NO: 59). The cDNA and 
protein sequences of murine TANGO 185 are shown in Figure 
5 12. 

Figure 27 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 28) and murine (SEQ 
ID NO:59) TANGO 185 (90.7% identity). Figure 37 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 6) 

10 and murine (SEQ ID NO:39) TANGO 185 (71.1% identity). 
Human TANGO 185 maps to chromosome 6. 
Northern analysis of human TANGO 185 mRNA expression 
revealed the presence of 2.2 kb major transcript and a 
4.2 kb minor transcript. This analysis also revealed 

15 that the 2.3 kb transcript is expressed at a high level 
in heart, placenta, and pancreas; at a moderate level in 
lung, liver, and kidney; and at a very low level, if at 
all, in brain and skeletal muscle. The 4.2 kb transcript 
is expressed at a low level in placenta. 

20 In situ analysis of TANGO 185 expression in adult mice 
revealed expression in the brain (choroid plexus) , 
submamandibular gland (ubiquitous expression) , white fat 
(weak expression, possible mammary gland expression) , 
stomach (mucosal epithelium) , kidney (medulla-cortex 

25 transition and medullary rays) , colon (weak expression in 
the epithelium) , small intestine (villi) , thymus (low 
level expression) , bladder (mucosal epithelium) , and 
placenta (ubiquitous expresion in decidua region) . This 
analysis did not reveal significant expression in adult 

30 eye and harderian gland, brown fat, heart, lung, liver, 
spleen, pancreas, skeletal muscle, testes, and ovaries. 

In situ analysis of TANGO 185 embryonic expression in 
mice revealed expression at E13.5 (high level expression 
the skin and submaxillary gland and low level ubiquitous 

35 expression in the liver); E14.5 (high level expression in 
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the choroid plexus of the lateral and fourth ventricles, 
skin, epithelium of the oral cavity, follicles of 
vibrissa, submaxillary gland, stomach, and heart; 
expression in lung (especially the developing large 
5 airways) and liver (ubiquitous expression)). At E15.5 the 
observed expression pattern is nearly identical to that 
at E14.5 except that there is expression in the region 
outlining the intestinal tract and lung expression is 
ubiguitous with higher expression in the region outlining 

10 the large airways. 

At E16.5 high level expression is observed in skin 
choroid plexus, the lining of the oral and nasal cavity, 
esophagus, bladder, stomach, intestine, large vessels of 
the heart, large airways of the lung, and the region 

15 outlining the vertebrae. Lower ubiquitous expression is 
present in the heart, lung and thymus. A somewhat 
higher, multifocal expression is present in the thymus. 

At E18.5 the expression pattern is identical to that 
observed at E16.5 except that expression is also observed 

20 in developing hair follicles. 

At PI. 5 the expression pattern is identical to that 
observed at E16.5 except that there is no long 
significant expression in the region outlining the 
vertebrae . 

25 The expression pattern of TANGO 185 during eubryonic 
development suggests that TANGO 185 expression is 
strongly associated with squamous and mucosal epithelial 
cells. 

The expression pattern of TANGO 185 suggests that it is 
30 involved in cell development and/or cell differentiation. 
Accordingly, TANGO 185 nucleic acid molecules and 
polypeptides as well as anti-TANGO 185 antibodies and 
modulators of TANGO 185 expression or activity may be 
useful in the treatment of disorders associated with 
35 aberrant cell development or cell differentiation, e.g., 
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cancer. There is evidence that TANGO 185 is expressed in 
prostate cells. Thus, TANGO 185 nucleic acid molecules 
and polypeptides as well as anti-TANGO 185 antibodies and 
modulators of TANGO 185 expression or activity may be 
5 useful in the treatment of prostate cancer. 

TANGO 186 

The human TANGO 186 cDNA of SEQ ID NO: 7 has a 1149 
nucleotide open reading frame (SEQ ID NO: 18) encoding a 
383 amino acid protein (SEQ ID NO: 29) . The cDNA and 
10 protein sequences of human TANGO 186 are shown in Figure 
13. 

Human TANGO 186 is predicted to be a secreted protein 
having a 20 amino acid signal sequence (amino acids 1 - 
20 of SEQ ID NO:29; SEQ ID NO:70) followed by a 363 amino 

15 acid mature protein (amino acids 21 - 383 of SEQ ID 
NO: 29; SEQ ID NO: 82). There are eight cysteines in 
mature TANGO 186. Some or all of these might be involved 
in disulfide bond formation. Human TANGO 186 is 
predicted to have a molecular weight of 43.0 kDa prior to 

20 cleavage of its signal peptide and a molecular weight of 
40.3 kDa subsequent to cleavage of its signal peptide. 

The murine TANGO 186 cDNA of SEQ ID NO: 40 has a 1146 
nucleotide open reading frame (SEQ ID NO: 50) encoding a 
382 amino acid protein (SEQ ID NO: 60) . The cDNA and 

25 protein sequences of murine TANGO 186 are shown in Figure 
14. Conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants 

Figure 28 depicts an alignment of the predicted amino 
30 acids sequences of human (SEQ ID NO: 29) and murine (SEQ 
ID NO:60) TANGO 186 (90.9% identity). Figure 38 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 7) 
and murine (SEQ ID NO:40) TANGO 186 (41.6% identity). 
The human and murine TANGO 186 proteins are highly 
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similar except within three portions: the signal 
sequence, a hinge region at amino acids 108-123, and a 
hinge region at amino acids 198-216. Within these three 
portions the proteins are only about 50% identical. 
5 Outside of these three portions the proteins are about 
97.3% identical. 

TANGO 186 maps to human chromosome llq!4. 
Northern analysis of human TANGO 186 mRNA expression 
revealed the presence of a 1.8 kb transcript and a 4 kb 

10 transcript. Both transcripts are expressed at a low 

level in heart, lung, liver, skeletal muscle, kidney, and 
pancreas and at a very low level in brain. 

In situ analysis of TANGO 186 in adult mice revealed 
that TANGO 186 is expressed in brain (olfactory bulb) , 

15 spleen (low level ubiquitous signal) , small intestine 
(very strong signal in villi and submucosa) , colon 
(ubiquitous signal), kidney (cortical and medullary 
region) , lung (bronchial epithelium) , eye (iris and 
cornea) , placenta (strong signal in the outer membrane) . 

20 This analysis did not detect expression in adult 

pancreas, heart, skeletal muscle, diaphragm, esophagus, 
liver, and thymus. 

In situ expression analysis of murine embryonic 
sagittal sections revealed expression at stage E13 . 5 in 

25 epithelium of the lower and upper lip, cartilage 

primordium of basisphenoid bone, cartilage condensation 
of sacral vertebral body (centrum) , small intestine, and 
heart. At stage E14.5, in addition to the expression 
observed at stage E13.5, expression was also observed in: 

30 eye (or cartilage around eye), Meckel's cartilage, and 
cartilage of the limb digits. At stage E15.5 expression 
was observed in vibrissae of the snout, kidney (embryonic 
glomeruli) , cartilage of the limb digits, cartilage of 
the vertebral column, heart, eye, and small intestine. 

35 At stage E16.5 the observed expression pattern was 
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similar to that observed at E15.5, but there was a 
notable reduction in signal from cartilage, epithelium of 
upper and lower lip, and heart. Also at stage E16.5 low 
level signal was observed in the lung, and a strong 
5 signal was still observed in the small intestine. At 
stage E17.5 expression of TANGO 186 was observed to be 
more ubiquitous. However, expression in cartilage was 
observed to decrease with the exception of ossification 
within cartilage primordium of body of mandible. At 

10 stage E17.5 strong expression continued to be observed in 
the small intestine. The expression pattern at stage 
PI. 5 was observed to be very similar to that observed at 
stage E17.5 with expression being nearly ubiquitous with 
the notable exceptions of the brain and spinal cord in 

15 which little or no expression was observed. At stage 
PI. 5 the highest expression observed was in the in the 
small intestine, lung, and kidney. 

Overall, the in situ expression analysis of adult and 
embryonic tissue revealed that expression is first 

20 observed in the developing cartilage, small intestine, 
and heart with, the cartilage expression being most 
striking in the developing vertebral column and jaw area. 
Strong expression in the cartilage of the .vertebral 
column and developing digits was observed through stage 

25 E16.5. Subsequently, cartilage expression was observed 
to decrease with some exceptions in the jaw area. Other 
embryonic tissue in which the observed expression was 
notable include the kidney, specifically the embryonic 
glomeruli, and the lung. These tissues continue to have 

30 strong expression in the adult with expression in the 
kidney also being observed in the medullary region and 
lung expression becoming restricted to the bronchial 
epithelium. Expression of TANGO 186 becomes more 
ubiquitous through PI. 5 with the most noticeable 
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exception being the brain and spinal cord. In the adult, 
hpwever, signal is observed in the olfactory bulb. 

In a murine LPS disease model, increasaed TANGO 186 
expression was observed in the brain 2 and 8 hours after 
5 LPS treatment. Decrease TANGO 186 expression was 

observed at these same time points in the kidney. TANGO 
186 expression was also observed in the gastric mucosa. 

As discussed above, murine in situ expression analysis 
demonstrates that TANGO 186 is expressed in cartilage 

10 throughout the embryo, suggesting that TANGO 186 is a 

regulatory molecule that plays a role in a bone formation 
(e.g., condensation of cartilage). Accordingly, TANGO 
186 nucleic acid molecules and polypeptides as well as 
ant i- TANGO 186 antibodies and modulators of TANGO 186 

15 expression or activity may be useful in the diagnosis and 
treatment of bone and cartilage disorders (e.g., 
osteogenesis imperfecta and broken bones, cartilage 
degradation, and bone degradation) . Moreover, many bone 
morphogenic proteins and TGP-/3 family members are 

20 regulated by extracellular proteins, e.g., noggin and 
chordin. Thus, TANGO 186, which is expressed in the 
heart, may play a role in heart development, and TANGO 
186 nucleic acid molecules and polypeptides as well as 
ant i- TANGO 186 antibodies and modulators of TANGO 186 

25 expression or activity may be useful in the diagnosis and 
treatment of developmental disorders of the heart , e.g., 
valve malformation. 

There is some seqeunce similarity between TANGO 186 and 
a Bacillus serine protease. Thus, TANGO 186 may have 

30 serine protease activity. 

TANGO 188 

The human TANGO 188 cDNA of SEQ ID NO: 8 has a 792 
nucleotide open reading frame (SEQ ID NO: 19) encoding a 
264 amino acid protein (SEQ ID NO: 30) . The cDNA and 
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protein sequences of human TANGO 188 are shown in Figure 
15. 

Human TANGO 188 is predicted to be a secreted protein 
having a 23 amino acid signal sequence (amino acids 1 - 
5 23 of SEQ ID NO:30; SEQ ID NO:71) followed by a 241 amino 
acid mature protein (amino acids 24 - 264 of SEQ ID 
NO:30; SEQ ID NO:83). Human TANGO 188 is predicted to 
have a molecular weight of 29.5 kDa, prior to cleavage of 
its signal peptide. 
10 The murine TANGO 188 cDNA of SEQ ID NO:41 has an. 807 
nucleotide open reading frame (SEQ ID NO: 51) encoding a 
269 amino acid protein (SEQ ID NO: 61) . The cDNA and 
protein sequences of murine TANGO 188 are shown in Figure 
16. 

15 Figure 29 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 30) and murine (SEQ 
ID NO:61) TANGO 188 (80.5% identity). Figure 39 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 8) 
and murine (SEQ ID NO:41) TANGO 188 (71.8% identity). 

20 TANGO 188 maps to human chromosome 16pl3.3. 

Northern analysis of human TANGO 188 mRNA expression 
revealed the presence of 2.0 kB transcript that is 
expressed at a low level in heart and pancreas and at a 
very low level , if at all, in brain, placenta, lung, 

25 liver, skeletal muscle, and kidney. 

In situ analysis of TANGO 188 expression in adult mice 
did not detect significant expression in in the bladder, 
placenta, pancreas, eye, heart, liver, thymus, spleen, 
kidney, lung, brain, skeletal muscle/diaphragm, colon, or 

30 small intestine. In situ analysis of TANGO 188 

expression in embryos revealed no significant expression 
at 13.5, E14.5, E15.5, E16.5, E17.5, or PI. 5. However, 
in the case of both adult mice and embryos, expression of 
TANGO 188 may have been obscured by a high background 

35 signal. 
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TANGO 188 is transcribed in an anti -sense relationship 
to NY-CO-7 (Scanlon et al. (1998) Jnt. J. Cancer 76:652- 
58) . Accordingly, TANGO 188 may have utility as a marker 
for colon cancer, and TANGO 188 nucleic acid molecules 
5 and polypeptides as well as anti-TANGO 188 antibodies and 
modulators of TANGO 188 expression or activity may be 
useful in the diagnosis and treatment of colon cancer or 
other types of cancer. 

The gene encoding the C. elegans homologue of NY-CO- 7 

10 is present in the same operon as a gene encoding a 
mitochondrial import protein. Since genes within the 
same operon are often co- regulated and encode proteins 
involved in the same physiological state, TANGO 188 may 
be a mitochondrial import protein or may be involved in 

15 some other mitochondrial function. Thus, TANGO 188 
nucleic acids and polypeptides as well as antibodies 
directed against TANGO 188 and modulators of TANGO 188 
expression or activity may be useful in the diagnosis and 
treatment of disorders associated with defects in 

20 mitochondrial function. 

TANGO 188 appears to be the homologue of a C. elegans 
protein that is present in the same operon as a gene 
encoding a protein that bears some similarity to SnF8p, a 
yeast zinc finger protein that is likely a transcription 

25 factor involved in expression of genes encoding certain 
proteins involved in respiration and metabolism. Since 
genes within the same operon are often co^ regulated and 
encode proteins involved in the same physiological state, 
TANGO 188 may play a role in respiration or metabolism. 

30 Thus, TANGO 188 nucleic acids and polypeptides as well as 
antibodies directed against TANGO 188 and modulators of 
TANGO 188 expression or activity may be useful in the 
diagnosis and treatment of disorders associated with 
defects in cell respiration or metabolism. 
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TANGO 189 

The human TANGO 189 cDNA of SEQ ID NO: 9 has a 759 
nucleotide open reading frame (SEQ ID NO: 20) encoding a 
253 amino acid protein (SEQ ID NO: 31) . The cDNA and 
5 protein sequences of human TANGO 189 are shown in Figure 
17. 

The human TANGO 189 cDNA described above (SEQ ID NO: 9; 
Figure 17) represents one splice variant of TANGO 189 
(splice variant 1A) . There exists a second splice 

10 variant of human TANGO 189 (splice variant IB) . The cDNA 
sequence of this splice variant is the same the cDNA 
sequence of human TANGO 189 described above, except that 
nucleotides 674-1087 are missing. This splice variant 
cDNA encodes a 184 amino acid protein having a predicted 

15 molecular weight of 21.1 kDa prior to cleavage of the 
predicted signal sequence. Both splice variant 1A and 
splice variant IB appear to arise from a 2.1 kB 
transcript which is 2055 nucleotides long, not including 
the polyA sequence. This transcript encodes a 253 amino 

20 acid protein having a predicted molecular weight of 28.6 
kDa, not including the predicted signal sequence. 

The 2.1 kb TANGO 189 transcript encodes a human TANGO 
189 protein that is predicted to be a transmembrane 
protein having a 24 or 25 amino acid signal sequence 

25 (amino acids 1- 24 or 1-25 of SEQ ID NO: 31; SEQ ID NO: 72 
and SEQ ID NO: 73) followed by a 227 or 226 amino acid 
mature protein (amino acids 25 - 251 or 26 - 251 of SEQ 
ID NO:31; SEQ ID NO:84 and SEQ ID NO:85) having a first 
extracellular domain of 114 or 115 amino acids (amino 

30 acids 25 - 138 or 26 - 138 of SEQ ID NO:31; SEQ ID NO: 92 
and SEQ ID NO: 93), followed by a first transmembrane 
domain (amino acids 139 - 164 of SEQ ID NO:31; SEQ ID 
NO: 99), a first cytoplasmic domain (amino acids 165 - 177 
of SEQ ID NO: 31; SEQ ID NO: 106), a second transmembrane 

35 domain (amino acids 178 - 195 of SEQ ID NO: 31; SEQ ID 
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NO: 100) , a second extracellular domain (amino acids 196 - 
211 of SEQ ID NO:31; SEQ ID NO:108), a third 
transmembrane domain (amino acids 212 - 237 of SEQ ID 
NO: 31; SEQ ID NO: 101), and a second cytoplasmic domain 
5 (amino acids 238 - 253 of SEQ ID NO: 31; SEQ ID NO: 107) . 
The protein encoded by this 2.1 kb TANGO 189 transcript 
is predicted to have a molecular weight of 21.8 kDa prior 
to cleavage of its signal peptide and a molecular weight 
of 25.2 kDa subsequent to cleavage of its signal peptide. 

10 The predicted domain structure of the protein encoded 
splice variant 1A is identical to that of the protein 
encoded by the 2.1 kb transcript up to amino acid 181. 
The predicted domain structure of the protein encoded 
splice variant IB is identical to that of the protein 

15 encoded by the 2.1 kb transcript up to amino acid 180. 
The murine TANGO 189 cDNA of SEQ ID NO: 42 has a 759 
nucleotide open reading frame (SEQ ID NO: 52) encoding a 
253 amino acid protein (SEQ ID NO: 62) . The cDNA and 
protein sequences of murine TANGO 189 are shown in Figure 

20 18. 

Figure 30 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 31; splice variant 
1A) and murine (SEQ ID NO:62) TANGO 189 (91.7% idenity) . 
Figure 40 depicts an alignment of the cDNA sequences of 

25 human (SEQ ID NO: 9; splice variant 1A) and murine (SEQ ID 
NO:42) TANGO 189 (51.8% identity). 

Northern analysis of human TANGO 189 mRNA expression 
revealed the presence of one major transcript (2.1 kb) 
and four minor transcripts (3.4. kb, 4.2 kb, 6 kb, and 7 

30 kb) . The 2.1 kB transcript is expressed at a high level 
in brain, spinal cord, and testis; expressed at a low 
level in heart, placenta, skeletal muscle, kidney, 
pancreas, lung, thyroid, lymph node, trachea, adrenal, 
bone marrow, spleen, ovary, and prostate; and expressed 

35 at a very low level in liver, stomach, thymus, small 
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intestine, colon, peripheral blood lymphocytes. The 
3.4. kb, 4.2 kb, 6 kb, and 7 kb transcripts are expressed 
at a moderate level in brain and spinal cord; and are not 
expressed in testis. The 4.6 and 7 kb transcripts are 
5 expressed at a moderate level in peripheral blood 
lymphocytes . 

Murine in situ expression analysis revealed that TANGO 
189 is expressed strongly and almost ubiquitously 
expressed in the mouse embryo. Tissues with the highest 

10 expreession during embryogenesis are the brain, spinal 
chord, and small intestine. Expression decreases in most 
if not all tissues by postnatal day 1.5 but tissues of 
highest expression remain the brain, spinal chord, and 
small intestine. This pattern continues into the adult 

15 mouse with expression in most tissues decreasing even 
more, some to background levels. Of the adult tissue 
tested, the brain, spleen, small intestine, and retina, 
have the highest signal. High level expression is 
observed in the folowing adult tissues: placenta 

20 (ubiquitous) , small intestine (except villi) , eye 
(retina) , brain (ubiquitous) . Lower expression is 
observed in: bladder (stronger signal in the transitional 
epithelium) , kidney, thymus, liver, placenta, spleen, and 
colon. Expression was not observed in: heart, skeletal 

25 muscle, diaphragm, lung, and pancreas. Embryonic 

expresion was observed at stages E13.5 through E17.5 
(high ubiquitous signal, brain, spinal chord, small 
intestine have the strongest signal) and PI. 5 (ubiquitous 
signal decreased in intensity, brain, spinal chord, small 

30 intestine, and kidney have the strongest signal) . 

TANGO 189 is useful as a tissue-specific marker. The 
expression of TANGO 189 may be altered in a variety of 
disease states (e.g., cancer). Thus, TANGO 189 nucleic 
acid molecules and polypeptides as well as ant i -TANGO 189 
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antibodies and modulators of TANGO 189 disorders cell 
proliferation and differentiation. 

TANGO 215 

The human TANGO 215 cDNA of SEQ ID NO: 10 has a 2160 
5 nucleotide open reading frame (SEQ ID NO: 21) encoding a 
720 amino acid protein (SEQ ID NO: 32) . The cDNA and 
protein sequences of human TANGO 215 are shown in Figure 
19. 

The cDNA sequence (SEQ ID NO: ) and predicted amino 

10 acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 181 clone are shown in Figure 56. 

Human TANGO 215 is predicted to be a wholly secreted 
protein having a 21 amino acid signal sequence (amino 
acids 1 - 21 of SEQ ID NO:32; SEQ ID NO:74) followed by a 
15 699 amino acid mature protein (amino acids 22 - 720 of 
SEQ ID NO: 32; SEQ ID NO: 86) . TANGO 215 is predicted to 
have a molecular weight of 80.3 kDa prior to cleavage of 
its signal peptide and a molecular weight of 77.6 kDa 
subsequent to cleavage of its signal peptide. 
20 TANGO 215 is related to Clr/Cls (Clq) and MASP1/MASP2 
(mannose-binding lect in-associated serine protease) 
proteases, all of which are involved in the alternative 
pathway pathway of immune response. 

TANGO 215 may be a theronine protease. There is a 
25 threonine in the sequence TGG at amino acid 664-666 of 
human and murine TANGO 215. This sequence is within a 
region having similarity to the active site of certain 
proteases. Human TANGO 215 is predicted to have CUB 

domain (amino acids 128 - 236 of SEQ ID NO:32) , an EGF 
30 domain (amino acids 239 - 271 of SEQ ID NO:32) , a small 
consensus repeat (SCR) domain (amino acids 280 - 342 of 
SEQ ID NO:32), a partial SCR domain (amino acids 408 - 
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442 of SEQ ID NO: 32), and a serine protease domain (amino 
acids 461 - 720 of SEQ ID NO:32) . 

Northern analysis of human TANGO 215 mRNA expression 
revealed the presence of a 2.7 kb transcript in heart, 
5 brain, and placenta. 

In situ analysis of TANGO 215 expression in adult mice 
revealed expression in the brain (cortex and caudate 
putamen) , kidney (cortex, most likely within the 
glomeruli) , bladder (ubiquitous expression) , liver 

10 (possibly within vessels) , and placenta (outer membrane 
region) . This analysis did not detect expression in the 
lung, small intestine, pancreas, thymus, eye, heart, or 
muscle/diaphragm . 

In situ analysis of TANGO 215 in embryos revealed 

15 expression at E13.5 in developing limbs and vertebrae. 
At E14.5 the observed expression pattern was similar to 
that at E13.5 except that expression was observed in the 
muscle surrounding abdomen, the skin, and the jaw. At 
E15.5 expression was observed in the developing kidney 

20 and bladder and outer layer of the tongue. At later 
ages, E16.5 through PI. 5, expression is observed in the 
smooth muscle layer of the small intestine, the portal 
regions of the liver, and the large airways of the lungs. 
Expression in the brain is absent until E18.5 when 

25 expression is apparent in the caudate putamen. 

Expression remains strong at PI. 5 in the vertebrae, tail, 
and sternum and possibly the muscle between developing 
bones. 

The region of human TANGO 215 from amino acid 280 to 
30 the end is predicted to be the human homologue of Limilus 
Factor C (27% identity) . Thus, this region of TANGO 215 
is predicted to include an effector domain (serine 
protease domain) and, perhaps, an LPS sensing domain. 
Thus, TANGO 215 may sense and respond to LPS with the 
35 response to the presence of LPS being activation of 
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serine protease activity. Accordingly, TANGO 215 nucleic 
acids and polypeptides as well as antibodies directed 
against TANGO 215 and modulators of TANGO 215 expression 
or activity may be useful in the diagnosis and treatment 
5 sepsis. 

CUB domains are extracellular domains of about 110 
amino acids. CUB domains are found in functionally 
diverse, mostly developmentally regulated proteins. Most 
contain four cysteines that are involved in two disulfide 

10 bonds (C1-C2 and C3-C4) . SCR domains are also known as 
complement control protein (CCP) modules. EGF domains 
are commonly involved in receptor- ligand interactions. 
CUB, EGF, and SCR domains are commonly involved in 
protein-protein interaction. Because these domains are 

15 present in TANGO 215, it is predicted to interact with 
one or more other proteins. The presence of these 
domains in TANGO 215 suggests that TANGO 215 is involved 
in development, perhaps bone and cartilage morphogenesis. 
TANGO 215 nucleic acid molecules and polypeptides as well 

20 as anti-TANGO 215 antibodies and modulators of TANGO 215 
expression or activity may be useful in the treatment of 
developmental disorders . 
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TANGO 187 

The human TANGO 187-1/3 cDNA of SEQ ID NO: 11 has a 1032 
nucleotide open reading frame (SEQ ID NO: 22) encoding a 
343 amino acid protein (SEQ ID NO: 33) . The cDNA and 
5 protein sequences of human TANGO 187-1/3 are shown in 
Figure 20. 

Human TANGO 187-1/3 is predicted to be a wholly 
secreted protein having a 20 amino acid signal sequence 
(amino acids 1 - 20 of SEQ ID NO: 33; SEQ ID NO: 75) 

10 followed by a 323 amino acid mature protein (amino acids 
21 - 343 of SEQ ID NO: 33; SEQ ID NO: 87) . Human TANGO 
187-1/3 is predicted to have a molecular weight of 37.5 
kDa prior to cleavage of its signal peptide and a 
molecular weight of 35.9 kDa subsequent to cleavage of 

15 its signal peptide. 

The TANGO 187-1/3 cDNA described upon actually 
represents one of 8 different TANGO 187 splice variants. 
Each variant contains none, one, two or three of three 
variant regions. These regions are referred to as region 

20 1, region 2, and region 3, and each of the various forms 
of TANGO 187 is referred to by including a reference to 
the variant regions present. Thus, the form of TANGO 187 
described above is TANGO 187-1/3 because it includes 
regions 1 and 3 . 

25 Figure 46 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1. 

Figure 47 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

30 187-2/3. 

Figure 48 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2/3. 
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Figure 49 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2. 

Figure 50 depicts the cDNA sequence (SEQ ID NO: ) and 

5 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-2. 

Figure 51 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-3. 

10 Figure 52 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187. This form does not include any of the three variant 
regions . 

The murine TANGO 187 cDNA of SEQ ID NO: 43 is only a 

15 partial sequence. This cDNA has an open reading frame 
extending from nucleotide 73 to the end of the available 
sequence (SEQ ID NO: 53) encoding a 152 amino acid protein 
(SEQ ID NO: 63) . The partial cDNA and protein sequences 
of murine TANGO 187 are shown in Figure 21. 

20 Figure 31 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 33) and murine (SEQ ID 
NO:63; partial) TANGO 187 (50.4% identity). Figure 41 
depicts an alignment of the cDNA sequences of human (SEQ 
ID NO: 11) and murine (SEQ ID NO:43; partial) TANGO 187 

25 (66.0% identity). 

Northern analysis of human TANGO 187 mRNA expression 
revealed the presence of 1.3 and 2.4 kb transcripts that 
are approximately equally expressed at a low level in 
heart, brain, lung, liver, and smooth muscle and at a 

30 moderate level in kidney and placenta. 

In situ analysis of TANGO 187 expression in adult mice 
revealed that TANGO 187 is expressed in brain (weak, 
ubiquitous signal) , eye and harderian gland (weak signal 
in the retina) , submandibular gland (weak, ubiquitous 

35 signal) , stomach (weak, ubiquitous signal) , kidney (weak, 
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ubiquitous signal), adrenal gland (low level, ubiquitous 
expression), colon (low level, ubiquitous expression), 
small intestine (low level, ubiquitous expression), 
thymus (moderate level, ubiquitous expression in the 
5 cortical region with lower expression in the medulla) , 
lymph node (ubiquitous expression) , spleen (low level 
ubiquitous expression with lower expression in the 
follicles, bladder (moderate expression in the mucosal 
epithelium) , testes (moderate, ubiquitous expression 

10 signal that defines the seminiferous vesicles) . In this 
analysis, TANGO 187 expression was not detectable in the 
spinal cord, brown fat, heart, lung, liver, pancreas, 
skeletal muscle, and ovaries. 

In situ analysis of TANGO 187 expression in embryos at 

15 E13.5 revealed ubiquitous expression with the strongest 
expression in the brain and spinal cord. A punctate 
expression pattern was observed in the lungs suggestive 
of higher expression in the developing large airways. At 
E14.5 the expression pattern was similar to that observed 

20 at E13.5 except that expression was observed in the 
developing olfactory system and the eye at a level 
similar to that observed in the brain and spinal cord. 
Expression is also present at E14.5 in the epithelium of 
the tongue, the dermis of the snout, the kidneys and the 

25 stomach. At E15.5 low level ubiquitous expression was 
observed with the highest expression in the brain, spinal 
cord, eye, and olfactory system. Slightly lower 
expression was observed in the lung (ubiquitous 
expression) and kidney (cortical region) than in the 

30 aforementioned neuronal tissues. At E16.5 the observed 
expression pattern is identical to that seen at E15.5 
except TANGO 187 expression is observed in the thymus and 
the mucosal portion of the stomach. At E18.5 TANGO 187 
continues to be highest in neuronal tissue with lower 

35 expression in the hind brain and spinal cord than in the 
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forebrain with the neopallial cortex having the highest 
signal. At E16.5 expression is observed in the thymus 
and small intestine. At PI. 5 the observed expression 
pattern is nearly identical to that at E18.5 except that 
5 expression in the the lung and stomach has decreased. At 
PI. 5 expression is highest in the brain, eye, olfactory 
epithelium and kidney. 

Tango 187 contain a region moderately similar to an 
armadillo/beta- catenin repeat. Such repeats are thought 
10 to be involved in protein-protein interactions. 
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TABLE 1: Summary of Human TANGO 180, TANGO 181, TANGO 
182, TANGO 183, TANGO 184, TANGO 185, TANGO 
186, TANGO 187, TANGO 188, TANGO 189, and 
TANGO 215 Sequence Information. 



Gene 


CDNA 


ORP 


Protein 


Pig. 


Accession 
No. 


TANGO 180 


SEQ ID NO:l 


SEQ ID NO: 12 


SEQ ID NO: 23 


Pig. 1 


ATCC 98900 


TANGO 181 


SEQ ID NO: 2 


SEQ ID NO: 13 


SEQ ID NO:24 


Fig. 3 


ATCC 98900 


TANGO 182 


SEQ ID NO:3 


SEQ ID NO: 14 


SEQ ID NO: 25 


Pig. 5 


ATCC 98900 


TANGO 183 


SEQ ID NO:4 


SEQ ID NO: 15 


SEQ ID NO:26 


Pig. 7 


ATCC 98900 


TANGO 184 


SEQ ID NO: 5 


SEQ ID NO:lo 


SEQ ID NO: 27 


Fig. 9 


ATCC 98900 


TANGO 185 


SEQ ID NO: 6 


SEQ ID NO: 17 


SEQ ID NO: 28 


Fig. 11 


ATCC 98901 


TANGO 186 


SEQ ID NO: 7 


SEQ ID NO: 18 


SEQ ID NO: 29 


Fig. 13 


ATCC 98901 


TANGO 188 


SEQ ID NO: 8 


SEQ ID NO: 19 


SEQ ID NO:30 


Fig. 15 


ATCC 98901 


TANGO 189 


SEQ ID NO: 9 


SEQ ID NO: 20 


SEQ ID NO: 31 


Pig. 17 


ATCC 98901 


TANGO 215 ' 


SEQ ID NO: 10 


SEQ ID NO: 21 


SEQ ID NO: 32 


Pig, 19 


ATCC 98899 


TANGO 187- 
1/3 


SEQ ID NO: 11 


SEQ ID NO: 22 


SEQ ID NO: 33 


Fig. 20 


ATCC 98901 


TANGO 187- 
1 


SEQ ID NO: 


SEQ ID NO: 


SEQ ID NO: 


Fig. 46 


ATCC 


TANGO 187- 
2/3 


SEQ ID NO: 


SEQ ID NO: 


SEQ ID NO: 


Fig. 47 


ATCC 


TANGO 187- 
1/2/3 


SEQ ID NO: 


SEQ ID NO: 


SEQ ID NO: 


Pig. 48 


ATCC 


TANGO 187- 
1/2 


SEQ ID NO: 


SEQ ID NO: 


SEQ ID NO: 


Fig. 49 


ATCC 


TANGO 187- 
2 


SEQ ID NO: 


SEQ ID NO: 


SEQ ID NO:_ 


Fig. 50 


ATCC 


TANGO 187- 
3 


SEQ ID NO: 


SEQ ID NO: 


SEQ ID NO: 


Pig. 51 


ATCC 
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TABLE 2: Summary of Domains of Human TANGO 180, TANGO 



181 , TANGO 182, TANGO 183, TANGO 184, TANGO 
185, TANGO 186, TANGO 187, TANGO 188, TANGO 
189, and TANGO 215. 



Protein 


Signal 
Sequence 


Mature 
Protein 


Extracellula 
r 

Domain 


Transmembran 
e 

Domain 


Cytoplasmic 
Domain 


TANGO 180 


aa 1-22 
SEQ ID 
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TABLE 3: Summary of Murine TANGO 180, TANGO 180, TANGO 

181, TANGO 182, TANGO 183, TANGO 184, TANGO 
185, TANGO 186, TANGO 188, TANGO 189, and 
TANGO 187 Sequence Information 



5 


Gene 


CDNA 


ORF 


Protein 


Figure 


AA align, 
with human 


NA 

align. 

with 

human 




TANGO 
180 


SEQ ID 
NO:34 


SEQ ID 
NO:44 


SEQ ID 
NO: 54 


Fig. 2 


Pig. 22 


Fig. 32 


10 


TANGO 
181 

Mart -i a 

i) 


SEQ ID 
NO:35 


SEQ ID 
N0:45 


SEQ ID 
NO: 55 


Fig. 4 


Pig. 23 


Fig. 33 


15 


innuU | 

182 

(partia 
1) 


QI?TI TTl 
OtL\i If 

NO: 36 


NO:46 


JLU 

NO: 56 


rig . o 


rig. 


rig. 




TANGO 
183 


SEQ ID 
NO:37 


SEQ ID 
NO: 47 


SEQ ID 
NO: 57 


Fig. 8 


Fig. 25 


Fig. 35 




TANGO 
184 


SEQ ID 
NO: 38 


SEQ ID 
NO:48 


SEQ ID 
NO: 58 


Fig. 10 


Pig. 26 


Pig. 36 


20 


TANGO 
185 


SEQ ID 
NO:39 


SEQ ID 
NO:49 


SEQ ID 
NO: 59 


Fig. 12 


Pig. 27 


Fig. 37 




TANGO 
186 


SEQ ID 
NO:40 


SEQ ID 
NO:50 


SEQ ID 
NO: 60 


Fig. 14 


Pig. 28 


Pig. 38 


25 


TANGO 
188 


SEQ ID 
NO:41 


SEQ ID 
NO:51 


SEQ ID 
NO: 61 


Fig. 16 


Pig. 29 


Pig. 39 




TANGO 
189 


SEQ ID 
NO:42 


SEQ ID 
NO: 52 


SEQ ID 
NO: 62 


Pig. 18 


Pig. 30 


Pig. 40 


30 


TANGO 
187 

(partia 
1) 


SEQ ID 
NO:43 


SEQ ID 
NO: 53 


SEQ ID 
NO: 63 


Fig. 21 


Fig. 31 


Fig. 41 



WO 00/18904 



PCT/US99722817 



- 48 - 



TANGO 
181 


SEQ ID 
NO: 


SEQ ID 
NO: 


SEQ ID 
NO: 


Fig. 53 






TANGO 
182 


SEQ ID 
NO: 


SEQ ID 
NO: 


SEQ ID 
NO: 


Fig. 54 






TANGO 
187 


SEQ ID 
NO: 


SEQ ID 
NO:_ 


SEQ ID 
NO: 


Fig. 55 






TANGO 
215 


SEQ ID 
NO: 


SEQ ID 
NO: 


SEQ ID 
NO: 


Fig. 56 







Various aspects of the invention are described in 
10 further detail in the following subsections 

I. Isolated Nucleic Acid Molecules 

One aspect of the invention pertains to isolated 
nucleic acid molecules that encode a polypeptide of the 
invention or a biologically active portion thereof, as 

15 well as nucleic acid molecules sufficient for use as 
hybridization probes to identify nucleic acid molecules 
encoding a polypeptide of the invention and fragments of 
such nucleic acid molecules suitable for use as PCR 
primers for the amplification or mutation of nucleic acid 

20 molecules. As used herein, the term "nucleic acid 
molecule" is intended to include DNA molecules (e.g., 
cDNA or genomic DNA) and RNA molecules (e.g., mRNA) and 
analogs of the DNA or RNA generated using nucleotide 
analogs. The nucleic acid molecule can be single- 

25 stranded or double -stranded, but preferably is double- 
stranded DNA. 

An "isolated" nucleic acid molecule is one which is 
separated from other nucleic acid molecules which are 
present in the natural source of the nucleic acid 
30 molecule. Preferably, an "isolated" nucleic acid 
molecule is free of sequences (preferably protein 
encoding sequences) which naturally flank the nucleic 
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acid (i.e., sequences located at the 5' and 3' ends of 
the nucleic acid) in the genomic DNA of the organism from 
which the nucleic acid is derived. For example, in 
various embodiments, the isolated nucleic acid molecule 
5 can contain less than about 5 kB, 4 kB, 3 kB, 2 kB, 1 kB, 
0.5 kB or 0.1 kB of nucleotide sequences which naturally 
flank the nucleic acid molecule in genomic DNA of the 
cell from which the nucleic acid is derived. Moreover, 
an "isolated" nucleic acid molecule, such as a cDNA 

10 molecule, can be substantially free of other cellular 
material, or culture medium when produced by recombinant 
techniques, or substantially free of chemical precursors 
or other chemicals when chemically synthesized. 

A nucleic acid molecule of the present invention, e.g., 

15 a nucleic acid molecule having the nucleotide sequence of 

any of SEQ ID Nos:l-22, 34-43, and - or the cDNA 

of a clone deposited as any of ATCC 98899, 98900, and 
989001, or a complement thereof, can be isolated using 
standard molecular biology techniques and the sequence 

20 information provided herein. Using all or a portion of 
the nucleic acid sequences of any of SEQ ID NOs:l-22, 34- 

43, and - or the cDNA of a clone deposited as any 

of ATCC 98899, 98900, and 989001 as a hybridization 
probe, nucleic acid molecules of the invention can be 

25 isolated using standard hybridization and cloning 

techniques (e.g., as described in Sambrook et al., eds., 
Molecular Cloning: A Laboratory Manual, 2nd ed., Cold 
Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989) . 

30 A nucleic acid molecule of the invention can be 

amplified using cDNA, mRNA or genomic DNA as a template 
and appropriate oligonucleotide primers according to 
standard PCR amplification techniques. The nucleic acid 
so amplified can be cloned into an appropriate vector and 

35 characterized by DNA sequence analysis. Furthermore, 
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oligonucleotides corresponding to all or a portion of a 
nucleic acid molecule of the invention can be prepared by 
standard synthetic techniques, e.g., using an automated 
DNA synthesizer. 
5 In another preferred embodiment, an isolated nucleic 
acid molecule of the invention comprises a nucleic acid 
molecule which is a complement of the nucleotide sequence 

shown in SEQ ID NOs:l-22, 34-43, and - or the 

cDNA of a clone deposited as ATCC 98899, 98900, and 

10 989001, or a portion thereof. A nucleic acid molecule 
which is complementary to a given nucleotide sequence is 
one which is sufficiently complementary to the given 
nucleotide sequence that it can hybridize to the given 
nucleotide sequence thereby forming a stable duplex. 

15 Moreover, a nucleic acid molecule of the invention can 
comprise only a portion of a nucleic acid sequence 
encoding a full length polypeptide of the invention for 
example, a fragment which can be used as a probe or 
primer or a fragment encoding a biologically active 

20 portion of a polypeptide of the invention. The nucleotide 
sequence determined from the cloning one gene allows for 
the generation of probes and primers designed for use in 
identifying and/or cloning homologues in other cell 
types, e.g., from other tissues, as well as homologues 

25 from other mammals. The probe/primer typically comprises 
substantially purified oligonucleotide. The 
oligonucleotide typically comprises a region of 
nucleotide sequence that hybridizes under stringent 
conditions to at least about 12, preferably about 25, 

30 more preferably about 50, 75, 100, 125, 150, 175, 200, 

250, 300, 350 or 400 consecutive nucleotides of the sense 
or anti-sense sequence of any of SEQ ID NOs:l-22, 34-43, 
and _ - _ or the cDNA of a clone deposited as ATCC 
98899, 98900, and 989001 or of a naturally occurring 

35 mutant of any of SEQ N0s:l-22, 34-43, and - or 
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the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001. 

Probes based on the sequence of a nucleic acid molecule 
of the invention can be used to detect transcripts or 
5 genomic sequences encoding the same protein molecule 
encoded by a selected nucleic acid molecule. The probe 
comprises a label group attached thereto, e.g., a 
radioisotope, a fluorescent compound, an enzyme, or an 
enzyme co- factor. Such probes can be used a.3 part of a 

10 diagnostic test kit for identifying cells or tissues 
which mis-express the protein, such as by measuring 
levels of a nucleic acid molecule encoding the protein in 
a sample of cells from a subject, e.g., detecting mRNA 
levels or determining whether a gene encoding the protein 

15 has been mutated or deleted. 

A nucleic acid fragment encoding a ^biologically active 
portion" of a polypeptide of the invention can be 
prepared by isolating a portion of any of SEQ ID N0s:l- 
22, 34-43, and - or the nucleotide sequence of 

20 the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001 which encodes a polypeptide having a biological 
activity, expressing the encoded portion of the 
polypeptide protein (e.g., by recombinant expression in 
vitro) and assessing the activity of the encoded portion 

25 of the polypeptide. 

The invention further encompasses nucleic acid 
molecules that differ from the nucleotide sequence of SEQ 

ID NOs:l-22, 34-43, and - or the cDNA of a clone 

of ATCC 98899, 98900, and 989001 due to degeneracy of the 

30 genetic code and thus encode the same protein as that 

encoded by the nucleotide sequence shown in any of SEQ ID 

NOs:l-22, 34-43 , and - or the cDNA of a clone 

deposited as ATCC 98899, 98900, and 989001. 

In addition to the nucleotide sequences shown in SEQ ID 

35 N0s:l-22, 34-43, and - and present in cDNA's of 
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the clones deposited of ATCC 98899, 98900, and 989001, it 
will be appreciated by those skilled in the art that DNA 
sequence polymorphisms that lead to changes in the amino 
acid sequence may exist within a population (e.g., the 
5 human population) . Such genetic polymorphisms may exist 
among individuals within a population due to natural 
allelic variation. An allele is one of a group of genes 
which occur alternatively at a given genetic locus. As 
used herein, the phrase "allelic variant" refers to a 

10 nucleotide sequence which occurs at a given locus or to a 
polypeptide encoded by the nucleotide sequence. As used 
herein, the terms n gene" and "recombinant gene" refer to 
nucleic acid molecules comprising an open reading frame 
encoding a polypeptide of the invention. Such natural 

15 allelic variations can typically result in 1-5% variance 
in the nucleotide sequence of a given gene. Alternative 
alleles can be identified by sequencing the gene of 
interest in a number of different individuals. This can 
be readily carried out by using hybridization probes to 

20 identify the same genetic locus in a variety of 

individuals. Any and all such nucleotide variations and 
resulting amino acid polymorphisms or variations that are 
the result of natural allelic variation and that do not 
alter the functional activity are intended to be within 

25 the scope of the invention. 

Moreover, nucleic acid molecules encoding proteins of 
the invention from other species (homologues) , which have 
a nucleotide sequence which differs from that of the 
human protein described herein are intended to be within 

30 the scope of the invention. Nucleic acid molecules 

corresponding to natural allelic variants and homologues 
of a cDNA of the invention can be isolated based on their 
identity to the human nucleic acid molecule disclosed 
herein using the human cDNAs, or a portion thereof, as a 

35 hybridization probe according to standard hybridization 
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techniques under stringent hybridization conditions. For 
example, a cDNA encoding a soluble form of a membrane - 
bound protein of the invention isolated based on its 
hybridization to a nucleic acid molecule encoding all or 
5 part of the membrane -bound form. Likewise, a cDNA 
encoding a membrane -bound form can be isolated based on 
its hybridization to a nucleic acid molecule encoding all 
or part of the soluble form. 

Accordingly, in another embodiment, an isolated nucleic 

10 acid molecule of the invention is at least 300 (325, 350, 
375, 400, 425, 450, 500, 550, 600, 650, 700, 800, 900, 
1000, or 1290) nucleotides in length and hybridizes under 
stringent conditions to the nucleic acid molecule 
comprising the nucleotide sequence, preferably the coding 

15 sequence, of any of SEQ ID NOs:l-22, 34-43, and - 

the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001, or a complement thereof. 

As used herein, the term "hybridizes under stringent 
conditions" is intended to describe conditions for 

20 hybridization and washing under which nucleotide 
sequences at least 60% (65%, 70%, preferably 75%) 
identical to each other typically remain hybridized to 
each other. Such stringent conditions are known to those 
skilled in the art and can be found in Current Protocols 

25 in Molecular Biology, John Wiley & Sons, N.Y. (1989) , 
6.3.1-6.3.6. A preferred, non-limiting example of 
stringent hybridization conditions are hybridization in 
6X sodium chloride/sodium citrate (SSC) at about 45°C, 
followed by one or more washes in 0.2 X SSC, 0.1% SDS at 

30 50-65°C. Preferably, an isolated nucleic acid molecule 
of the invention that hybridizes under stringent 
conditions to the sequence of any of SEQ ID NOs:l-22, 34- 

43, and - , the cDNA of ATCC 98899, 98900, and 

989001, or the complement thereof, corresponds to a 

35 naturally-occurring nucleic acid molecule. As used 
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herein, a "naturally-occurring 11 nucleic acid molecule 
refers to an RNA or DNA molecule having a nucleotide 
sequence that occurs in nature (e.g., encodes a natural 
protein) . 

5 In addition to naturally-occurring allelic variants of 
a nucleic acid molecule of the invention sequence that 
may exist in the population, the skilled artisan will 
further appreciate that changes can be introduced by 
mutation thereby leading to changes in the amino acid 

10 sequence of the encoded protein, without altering the 

biological activity of the protein. For example, one can 
make nucleotide substitutions leading to amino acid 
substitutions at "non-essential n amino acid residues. A 
"non-essential" amino acid residue is a residue that can 

15 be altered from, the wild- type sequence without altering 
the biological activity, whereas an "essential" amino 
acid residue is required for biological activity. For 
example, amino acid residues that are not conserved or 
only semi -conserved among homologues of various species 

20 may be non-essential for activity and thus would be 

likely targets for alteration. Alternatively, amino acid 
residues that are conserved among the homologues of 
various species (e.g., murine and human) may be essential 
for activity and thus would not be likely targets for 

25 alteration. Conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants 

Accordingly, another aspect of the invention 
pertains to nucleic acid molecules encoding a polypeptide 

30 of the invention that contain changes in amino acid 
residues that are not essential for activity. Such 
polypeptides differ in amino acid sequence from SEQ ID 

NOs:23-33, 54-63, and - yet retain biological 

activity. In one embodiment, the isolated nucleic acid 

35 molecule includes a nucleotide sequence encoding a 
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protein that includes an amino acid sequence that is at 
least about 45% identical, 65%, 75%, 85%, 95%, or 98% 
identical to the amino acid sequence of any of SEQ ID 

Nos:23-3, 54-63, and - . 

5 An isolated nucleic acid molecule encoding a variant 
protein can be created by introducing one or more 
nucleotide substitutions, additions or deletions into the 

nucleotide sequence of SEQ ID N0s:l-22, 34-43, and - 

the cDNA of a clone deposited of ATCC 98899, 98900, 

10 and 989001 such that one or more amino acid 

substitutions, additions or deletions are introduced into 
the encoded protein. Mutations can be introduced by 
standard techniques, such as site-directed mutagenesis 
and PCR-mediated mutagenesis. Preferably, conservative 

15 amino acid substitutions are made at one or more 
predicted non-essential amino acid residues. A 
"conservative amino acid substitution" is one in which 
the amino acid residue is replaced with an amino acid 
residue having a similar side chain. Families of amino 

20 acid residues having similar side chains have been 

defined in the art. These families include amino acids 
with basic side chains (e.g., lysine, arginine, 
histidine) , acidic side chains (e.g., aspartic acid, 
glutamic acid), uncharged polar side chains (e.g., 

25 glycine, asparagine, glutamine, serine, threonine, 

tyrosine, cysteine), nonpolar side chains (e.g., alanine, 
valine, leucine, isoleucine, proline, phenylalanine, 
methionine, tryptophan), beta-branched side chains (e.g., 
threonine, valine, isoleucine) and aromatic side chains 

30 (e.g., tyrosine, phenylalanine, tryptophan, histidine). 
Alternatively, mutations can be introduced randomly along 
all or part of the coding sequence, such as by 
saturation mutagenesis, and the resultant mutants can be 
screened for biological activity to identify mutants that 

35 retain activity. Following mutagenesis, the encoded 
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protein can be expressed recombinant ly and the activity 
of the protein can be determined. 

In a preferred embodiment, a mutant polypeptide that is 
a variant of a polypeptide of the invention can be 
5 assayed for: (1) the ability to form protein: protein 

interactions with proteins in a signalling pathway of the 
polypeptide of the invention; (2) the ability to bind a 
ligand of the polypeptide of the invention; or (3) the 
ability to bind to an intracellular target protein of the 

10 polypeptide of the invention. In yet another preferred 
embodiment, the mutant polypeptide can be assayed for the 
ability to modulate cellular proliferation or cellular 
differentiation. 

The present invention encompasses antisense nucleic 

15 acid molecules, i.e., molecules which are complementary 
to a sense nucleic acid encoding a polypeptide of the 
invention, e.g., complementary to the coding strand of a 
double- stranded cDNA molecule or complementary to an mRNA 
sequence. Accordingly, an antisense nucleic acid can 

20 hydrogen bond to a sense nucleic acid. The antisense 
nucleic acid can be complementary to an entire coding 
strand, or to only a portion thereof, e.g., all or part 
of the protein coding region (or open reading frame) . An 
antisense nucleic acid molecule can be antisense to all 

25 or part of a noncoding region of the coding strand of a 
nucleotide sequence encoding a polypeptide of the 
invention. The noncoding regions ("5' and 3' 
untranslated regions") are the 5' and 3? sequences which 
flank the coding region and are not translated into amino 

30 acids. 

An antisense oligonucleotide can be, for example, 
about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides 
in length. An antisense nucleic acid of the invention 
can be constructed using chemical synthesis and enzymatic 
35 ligation reactions using procedures known in the art. 
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For example, an antisense nucleic acid (e.g., an 
antisense oligonucleotide) can be chemically synthesized 
using naturally occurring nucleotides or variously 
modified nucleotides designed to increase the biological 
5 stability of the molecules or to increase the physical 
stability of the duplex formed between the antisense and 
sense nucleic acids, e.g., phosphorothioate derivatives 
and acridine substituted nucleotides can be used. 
Examples of modified nucleotides which can be used to 

10 generate the antisense nucleic acid include 5- 
fluorouracil, 5-bromouracil, 5-chlorouracil, 5- 
iodouracil , hypoxanthine , xanthine , 4 -acetyl cytosine , 5 - 
( carboxyhydroxy Ime t hy 1 ) urac i 1 , 5 - 
carboxymethylaminomethyl - 2 - thiouridine , 5 - 

15 carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine , inosine , N6 - isopentenyladenine , 1 - 
methylguanine, 1-methylinosine, 2, 2 -dimethyl guanine, 2- 
methyl adenine, 2 -methylguanine , 3 -methyl cytosine, 5- 
methyl cytosine, N6 -adenine, 7 -methylguanine, 5- 

20 methylaminomethyluracil , 5-methoxyaminomethyl-2- 
thiouracil, beta-D-mannosylqueosine, 5'- 
methoxycarboxymethyluracil , 5-methoxyuracil, 2- 
methyl thio -N6 - i sopent enyl adenine , urac i 1 - 5 - oxyace t i c acid 
(v) , wybutoxosine, pseudouracil, queosine, 2- 

25 thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4- 
thiouracil, 5-methyluracil, uracil -5-oxyacetic acid 
methylester, uracil-5-oxyacetic acid (v) , 5-methyl-2- 
thiouracil, 3- (3-amino-3-N-2-carboxypropyl) uracil, 
(acp3)w, and 2, 6-diaminopurine. Alternatively, the 

30 antisense nucleic acid can be produced biologically using 
an expression vector into which a nucleic acid has been 
subcloned in an antisense orientation (i.e., RNA 
transcribed from the inserted nucleic acid will be of an 
antisense orientation to a target nucleic acid of 

35 interest, described further in the following subsection) . 
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The antisense nucleic acid molecules of the invention 
are typically administered to a subject or generated in 
situ such that they hybridize with or bind to cellular 
mRNA and/or genomic DNA encoding a selected polypeptide 
5 of the invention to thereby inhibit expression, e.g., by 
inhibiting transcription and/or translation. The 
hybridization can be by conventional nucleotide 
complementarity to form a stable duplex, or, for example, 
in the case of an antisense nucleic acid molecule which 

10 binds to DNA duplexes, through specific interactions in 
the major groove of the double helix. An example of a 
route of administration of antisense nucleic acid 
molecules of the invention includes direct injection at a 
tissue site. Alternatively, antisense nucleic acid 

15 molecules can be modified to target selected cells and 
then administered systemically. For example, for 
systemic administration, antisense molecules can be 
modified such that they specifically bind to receptors or 
antigens expressed on a selected cell surface, e.g., by 

20 linking the antisense nucleic acid molecules to peptides 
or antibodies which bind to cell surface receptors or 
antigens. The antisense nucleic acid molecules can also 
be delivered to cells using the vectors described herein. 
To achieve sufficient intracellular concentrations of the 

25 antisense molecules, vector constructs in which the 
antisense nucleic acid molecule is placed under the 
control of a strong pol II or pol III promoter are 
preferred. 

An antisense nucleic acid molecule of the invention can 
30 be an a-anomeric nucleic acid molecule. An a-anomeric 
nucleic acid molecule forms specific double- stranded 
hybrids with complementary RNA in which, contrary to the 
usual 0-units, the strands run parallel to each other 
(Gaultier et al. (1987) Nucleic Acids Res. 15:6625-6641). 
35 The antisense nucleic acid molecule can also comprise a 
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2' -o-methylribonucleotide (Inoue et al. (1987) Nucleic 
Acids Res. 15:6131-6148) or a chimeric RNA-DNA analogue 
(Inoue et al, (1987) FEBS Lett. 215:327-330). 

The invention also encompasses ribozymes. Ribozymes 
5 are catalytic RNA molecules with ribonuclease activity 
which are capable of cleaving a single -stranded nucleic 
acid, such as an mRNA, to which they have a complementary 
region. Thus, ribozymes (e.g., hammerhead ribozymes 
(described in Haselhoff and Gerlach (1988) Nature 

10 334:585-591)) can be used to catalytically cleave mRNA 
transcripts to thereby inhibit translation of the protein 
encoded by the mRNA. A ribozyme having specificity for a 
nucleic acid molecule encoding a polypeptide of the 
invention can be designed based upon the nucleotide 

15 sequence of a cDNA disclosed herein. For example, a 
derivative of a Tetrahymena L-19 IVS RNA can be 
constructed in which the nucleotide sequence of the 
active site is complementary to the nucleotide sequence 
to be cleaved in a Cech et al. U.S. Patent No. 4,987,071; 

20 and Cech et al. U.S. Patent No. 5,116,742. 

Alternatively, an mRNA encoding a polypeptide of the 
invention can be used to select a catalytic RNA having a 
specific ribonuclease activity from a pool of RNA 
molecules. See, e.g. , Bartel and Szostak (1993) Science 

25 261:1411-1418. 

The invention also encompasses nucleic acid molecules 
which form triple helical structures. For example, 
expression of a polypeptide of the invention can be 
inhibited by targeting nucleotide sequences complementary 

30 to the regulatory region of the gene encoding the 

polypeptide (e.g., the promoter and/ or enhancer) to form 
triple helical structures that prevent transcription of 
the gene in target cells. See generally Helene (1991) 
Anticancer Drug Des. 6(6):569-84; Helene (1992) Ann. N.Y. 
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Acad. Sci. 660:27-36; and Maher (1992) Bioassays 
14(12) :807-15. 

In preferred embodiments, the nucleic acid molecules of 
the invention can be modified at the base moiety, sugar 
5 moiety or phosphate backbone to improve, e.g., the 

stability, hybridization, or solubility of the molecule. 
For example, the deoxyribose phosphate backbone of the 
nucleic acids can be modified to generate peptide nucleic 
acids (see Hyrup et al. (1996) Bioorgranic & Medicinal 

10 Chemistry 4(1) : 5-23) . As used herein, the terms 

"peptide nucleic acids" or "PNAs" refer to nucleic acid 
mimics, e.g., DNA mimics, in which the deoxyribose 
phosphate backbone is replaced by a pseudopeptide 
backbone and only the four natural nucleobases are 

15 retained. The neutral backbone of PNAs has been shown to 
allow for specific hybridization to DNA and RNA under 
conditions of low ionic strength. The synthesis of PNA 
oligomers can be performed using standard solid phase 
peptide synthesis protocols as described in Hyrup et al . 

20 (1996), supra; Perry-O' Keef e et al. (1996) Proc. Natl. 
Acad. Sci. USA 93: 14670-675. 

PNAs can be used in therapeutic and diagnostic 
applications. For example, PNAs can be used as antisense 
or antigene agents for sequence-specif ic modulation of 

25 gene expression by, e.g., inducing transcription or 

translation arrest or inhibiting replication. PNAs can 
also be used, e.g., in the analysis of single base pair 
mutations in a gene by, e.g., PNA directed PGR clamping; 
as artificial restriction enzymes when used in 

30 combination with other enzymes, e.g., SI nucleases (Hyrup 
(1996) , supra; or as probes or primers for DNA sequence 
and hybridization (Hyrup (1996), supra; Perry-O' Keef e et 
al. (1996) Proc. Natl. Acad. Sci. USA 93: 14670-675). 
In another embodiment, PNAs can be modified, e.g., to 

35 enhance their stability or cellular uptake, by attaching 
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lipophilic or other helper groups to PNA # by the 
formation of PNA-DNA chimeras, or by the use of liposomes 
or other techniques of drug delivery known in the art. 
For example, PNA-DNA chimeras can be generated which may 
5 combine the advantageous properties of PNA and DNA. Such 
chimeras allow DNA recognition enzymes, e.g., RNAse H and 
DNA polymerases, to interact with the DNA portion while 
the PNA portion would provide high binding affinity and 
specificity. PNA-DNA chimeras can be linked using 

10 linkers of appropriate lengths selected in terms of base 
stacking, number of bonds between the nucleobases, and 
orientation (Hyrup (1996) , supra) . The synthesis of PNA- 
DNA chimeras can be performed as described in Hyrup 
(1996), supra, and Finn et al. (1996) Nucleic Acids Res. 

15 24 (17) :3357-63. For example, a DNA chain can be 
synthesized on a solid support using standard 
phosphoramidite coupling chemistry and modified 
nucleoside analogs. Compounds such as 5' -(4- 
methoxytrityl)amino-5' -deoxy- thymidine phosphoramidite 

20 can be used as a link between the PNA and the 5' end of 
DNA (Mag et al. (1989) Nucleic Acids Res. 17:5973-88). 
PNA monomers are then coupled in a stepwise manner to 
produce a chimeric molecule with a 5' PNA segment and a 
3' DNA segment (Finn et al. (1996) Nucleic Acids Res. 

25 24 (17) :3357-63) . Alternatively, chimeric molecules can 
be synthesized with a 5' DNA segment and a 3 r PNA segment 
(Peterser et al. (1975) Bioorganic Med. Chem. Lett. 
5:1119-11124) . 

In other embodiments, the oligonucleotide may include 

30 other appended groups such as peptides (e.g., for 
targeting host cell receptors in vivo) , or agents 
facilitating transport across the cell membrane (see, 
e.g., Letsinger et al. (1989) Proc. Natl. Acad. Sci. USA 
86:6553-6556; Lemaitre et al. (1987) Proc. Natl. Acad. 

35 Sci. USA 84:648-652; PCT Publication No. W0 88/09810) or 
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the blood-brain barrier (see, e.g., PCT Publication No. 
WO 89/10134) . In addition, oligonucleotides can be 
modified with hybridization-triggered cleavage agents 
(see, e.g., Krol et al. (1988) Bio/Techniques 6:958-976) 
5 or intercalating agents (see, e.g., Zon (1988) Pharm. 

Res. 5:539-549), To this end, the oligonucleotide may be 
conjugated to another molecule, e.g., a peptide, 
hybridization triggered cross-linking agent, transport 
agent, hybridization- triggered cleavage agent, etc. 

10 II, Isolated Proteins and Antibodies 

One aspect of the invention pertains to isolated 
proteins, and biologically active portions thereof, as 
well as polypeptide fragments suitable for use as 
immunogens to raise antibodies directed against a 

15 polypeptide of the invention. In one embodiment, the 
native polypeptide can be isolated from cells or tissue 
sources by an appropriate purification scheme using 
standard protein purification techniques. In another 
embodiment, polypeptides of the invention are produced by 

20 recombinant DNA techniques. Alternative to recombinant 
expression, a polypeptide of the invention can be 
synthesized chemically using standard peptide synthesis 
techniques . 

An "isolated" or "purified" protein or biologically 
25 active portion thereof is substantially free of cellular 
material or other contaminating proteins from the cell or 
tissue source from which the protein is derived, or 
substantially free of chemical precursors or other 
chemicals when chemically synthesized. The language 
30 "substantially free of cellular material" includes 

preparations of protein in which the protein is separated 
from cellular components of the cells from which it is 
isolated or recombinantly produced. Thus, protein that 
is substantially free of cellular material includes 
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preparations of protein having less than about 30%, 20%, 
10%, or 5% (by dry weight) of heterologous protein (also 
referred to herein as a "contaminating protein") . When 
the protein or biologically active portion thereof is 
5 recombinantly produced, it is also preferably 

substantially free of culture medium, i.e., culture 
medium represents less than about 20%, 10%, or 5% of the 
volume of the protein preparation. When the protein is 
produced by chemical synthesis, it is preferably 

10 substantially free of chemical precursors or other 

chemicals, i.e., it is separated from chemical precursors 
or other chemicals which are involved in the synthesis of 
the protein. Accordingly such preparations of the 
protein have less than about 30%, 20%, 10%, 5% (by dry 

15 weight) of chemical precursors or compounds other than 
the polypeptide of interest. 

Biologically active portions of a polypeptide of the 
invention include polypeptides comprising amino acid 
sequences sufficiently identical to or derived from the 

20 amino acid sequence of the protein (e.g., the amino acid 

sequence shown in any of SEQ ID Nos: 23-33, 54-63, and 

- which include fewer amino acids than the full length 

protein, and exhibit at least one activity of the 
corresponding full-length protein. Typically, 

25 biologically active portions comprise a domain or motif 
with at least one activity of the corresponding protein. 
A biologically active portion of a protein of the 
invention can be a polypeptide which is, for example, 10, 
25, 50, 100 or more amino acids in length. Moreover, 

30 other biologically active portions, in which other 

regions of the protein are deleted, can be prepared by 
recombinant techniques and evaluated for one or more of 
the functional activities of the native form of a 
polypeptide of the invention. 
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Preferred polypeptides have the amino acid sequence of 

any of SEQ ID Nos:23-33, 54-63, and - . Other 

useful proteins are substantially identical (e.g., at 
least about 45% , preferably 55%, 65%, 75%, 85%, 95%, or 

5 99%) to any of SEQ ID Nos: 22-33, 54-63, and - and 

retain the functional activity of the protein of the 
corresponding naturally-occurring protein yet differ in 
amino acid sequence due to natural allelic variation or 
mutagenesis. 

10 To determine the percent identity of two amino acid 
sequences or of two nucleic acids, the sequences are 
aligned for optimal comparison purposes (e.g., gaps can 
be introduced in the sequence of a first amino acid or 
nucleic acid sequence for optimal alignment with a second 

15 amino or nucleic acid sequence) . The amino acid residues 
or nucleotides at corresponding amino acid positions or 
nucleotide positions are then compared. When a position 
in the first sequence is occupied by the same amino acid 
residue or nucleotide as the corresponding position in 

20 the second sequence, then the molecules are identical at 
that position. The percent identity between the two 
sequences is a function of the number of identical 
positions shared by the sequences (i.e., % identity » # 
of identical positions/total # of positions (e.g., 

25 overlapping positions) x 100) . Preferably, the two 
sequences are the same length. 

The determination of percent homology between two 
sequences can be accomplished using a mathematical 
algorithm. A preferred, non-limiting example of a 

30 mathematical algorithm utilized for the comparison of two 
sequences is the algorithm of Karlin and Altschul (1990) 
Proc. Natl. Acad. Sci. USA 87:2264-2268, modified as in 
Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 
90:5873-5877. Such an algorithm is incorporated into the 

35 NBLAST and XBLAST programs of Altschul, et al. (1990) J. 
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Mol. Biol. 215:403-410. BLAST nucleotide searches can be 
performed with the NBLAST program, score = 100 # 
wordlength = 12 to obtain nucleotide sequences homologous 
to a nucleic acid molecules of the invention, BLAST 
5 protein searches can be performed with the XBLAST 

program, score = 50, wordlength = 3 to obtain amino acid 
sequences homologous to a protein molecules of the 
invention. To obtain gapped alignments for comparison 
purposes, Gapped BLAST can be utilized as described in 

10 Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402. 
Alternatively, PSI -Blast can be used to perform an 
iterated search which detects distant relationships 
between molecules. Jd. When utilizing BLAST, Gapped 
BLAST, and PSI-Blast programs, the default parameters of 

15 the respective programs (e.g., XBLAST and NBLAST) can be 
used. See http://www.ncbi.nlm.nih.gov. Another 
preferred, non-limiting example of a mathematical 
algorithm utilized for the comparison of sequences is the 
algorithm of Myers and Miller, (1988) CABIOS 4:11-17. 

20 Such an algorithm is incorporated into the ALIGN program 
(version 2.0) which is part of the GCG sequence alignment 
software package. When utilizing the ALIGN program for 
comparing amino acid sequences, a PAM120 weight residue 
table, a gap length penalty of 12, and a gap penalty of 4 

25 can be used. 

The percent identity between two sequences can be 
determined using techniques similar to those described 
above, with or without allowing gaps. In calculating 
percent identity, only exact matches are counted. 

30 The invention also provides chimeric or fusion 
proteins. As used herein, a "chimeric protein" or 
"fusion protein" comprises all or part (preferably 
biologically active) of a polypeptide of the invention 
operably linked to a heterologous polypeptide (i.e., a 

35 polypeptide other than the same polypeptide of the 
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invention) . Within the fusion protein, the term 
"operably linked" is intended to indicate that the 
polypeptide of the invention and the heterologous 
polypeptide are fused in- frame to each other. The 
5 heterologous polypeptide can be fused to the N- terminus 
or C- terminus of the polypeptide of the invention. 

One useful fusion protein is a GST fusion protein in 
which the polypeptide of the invention is fused to the C- 
terminus of GST sequences. Such fusion proteins can 

10 facilitate the purification of a recombinant polypeptide 
of the invention. 

In another embodiment, the fusion protein contains a 
heterologous signal sequence at its N- terminus. For 
example, the native signal sequence of a polypeptide of 

15 the invention can be removed and replaced with a signal 
sequence from another protein. For example, the gp67 
secretory sequence of the baculovirus envelope protein 
can be used as a heterologous signal sequence (Current 
Protocols in Molecular Biology, Ausubel et al., eds., 

20 John Wiley & Sons, 1992) . Other examples of eukaryotic 
heterologous signal sequences include the secretory 
sequences of melittin and human placental alkaline 
phosphatase (Stratagene; La Jolla, California) . In yet 
another example, useful prokaryotic heterologous signal 

25 sequences include the phoA secretory signal (Sambrook et 
al., supra) and the protein A secretory signal (Pharmacia 
Biotech; Piscataway, New Jersey) . 

In yet another embodiment, the fusion protein is an 
immunoglobulin fusion protein in which all or part of a 

30 polypeptide of the invention is fused to sequences 
derived from a member of the immunoglobulin protein 
family. The immunoglobulin fusion proteins of the 
invention can be incorporated into pharmaceutical 
compositions and administered to a subject to inhibit an 

35 interaction between a ligand (soluble or membrane -bound) 
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and a protein on the surface of a cell (receptor) , to 
thereby suppress signal transduction in vivo. The 
immunoglobulin fusion protein can be used to affect the 
bioavailability of a cognate ligand of a polypeptide of 
5 the invention. Inhibition of ligand/ receptor interaction 
may be useful therapeutically, both for treating 
proliferative and dif ferentiative disorders and for 
modulating (e.g. promoting or inhibiting) cell survival. 
Moreover, the immunoglobulin fusion proteins of the 

10 invention can be used as immunogens to produce antibodies 
directed against a polypeptide of the invention in a 
subject, to purify ligands and in screening assays to 
identify molecules which inhibit the interaction of 
receptors with ligands. 

15 Chimeric and fusion protein of the invention can be 
produced by standard recombinant DNA techniques. In 
another embodiment, the fusion gene can be synthesized by 
conventional techniques including automated DNA 
synthesizers. Alternatively, PCR amplification of gene 

20 fragments can be carried out using anchor primers which 
give rise to complementary overhangs between two 
consecutive gene fragments which can subsequently be 
annealed and reamplified to generate a chimeric gene 
sequence (see, e.g., Ausubel et al., supra). Moreover, 

25 many expression vectors are commercially available that 
already encode a fusion moiety (e.g., a GST polypeptide). 
A nucleic acid encoding a polypeptide of the invention 
can be cloned into such an expression vector such that 
the fusion moiety is linked in- frame to the polypeptide 

30 of the invention. 

A signal sequence of a polypeptide of the invention 
(SEQ ID NOs: 64-75) can be used to facilitate secretion 
and isolation of the secreted protein or other proteins 
of interest. Signal sequences are typically 

35 characterized by a core of hydrophobic amino acids which 
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are generally cleaved from the mature protein during 
secretion in one or more cleavage events. Such signal 
peptides contain processing sites that allow cleavage of 
the signal sequence from the mature proteins as they pass 
5 through the secretory pathway. Thus, the invention 
pertains to the described polypeptides having a signal 
sequence, as well as to the signal sequence itself and to 
the polypeptide in the absence of the signal sequence 
(i.e., the cleavage products). In one embodiment, a 

10 nucleic acid sequence encoding a signal sequence of the 
invention can be operably linked in an expression vector 
to a protein of interest, such as a protein which is 
ordinarily not secreted or is otherwise difficult to 
isolate. The signal sequence directs secretion of the 

15 protein, such as from a eukaryotic host into which the 
expression vector is transformed, and the signal sequence 
is subsequently or concurrently cleaved. The protein can 
then be readily purified from the extracellular medium by 
art recognized methods. Alternatively, the signal 

20 sequence can be linked to the protein of interest using a 
sequence which facilitates purification, such as with a 
GST domain. 

In another embodiment, the signal sequences of the 
present invention can be used to identify regulatory 

25 sequences, e.g., promoters, enhancers, repressors. Since 
signal sequences are the most amino- terminal sequences of 
a peptide, it is expected that the nucleic acids which 
flank the signal sequence on its amino-terminal side will 
be regulatory sequences which affect transcription. 

30 Thus, a nucleotide sequence which encodes all or a 

portion of a signal sequence can be used as a probe to 
identify and isolate signal sequences and their flanking 
regions, and these flanking regions can be studied to 
identify regulatory elements therein. 
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The present invention also pertains to variants of the 
polypeptides of the invention. Such variants have an 
altered amino acid sequence which can function as either 
agonists (mimetics) or as antagonists. Variants can be 
5 generated by mutagenesis, e.g., discrete point mutation 
or truncation. An agonist can retain substantially the 
same, or a subset, of the biological activities of the 
naturally occurring form of the protein. An antagonist 
of a protein can inhibit one or more of the activities of 

10 the naturally occurring form of the protein by, for 
example, competitively binding to a downstream or 
upstream member of a cellular signaling cascade which 
includes the protein of interest. Thus, specific 
biological effects can be elicited by treatment with a 

15 variant of limited function. Treatment of a subject with 
a variant having a subset of the biological activities of 
the naturally occurring form of the protein can have 
fewer side effects in a subject relative to treatment 
with the naturally occurring form of the protein. 

20 Variants of a protein of the invention which function 
as either agonists (mimetics) or as antagonists can be 
identified by screening combinatorial libraries of 
mutants, e.g., truncation mutants, of the protein of the 
invention for agonist or antagonist activity. In one 

25 embodiment, a variegated library of variants is generated 
by combinatorial mutagenesis at the nucleic acid level 
and is encoded by a variegated gene library. A 
variegated library of variants can be produced by, for 
example, enzymatically ligating a mixture of synthetic 

30 oligonucleotides into gene sequences such that a 
degenerate set of potential protein sequences is 
expressible as individual polypeptides, or alternatively, 
as a set of larger fusion proteins (e.g., for phage 
display) . There are a variety of methods which can be 

35 used to produce libraries of potential variants of the 
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polypeptides of the invention from a degenerate 
oligonucleotide sequence. Methods for synthesizing 
degenerate oligonucleotides are known in the art (see, 
e.g., Narang (1983) Tetrahedron 39:3; Itakura et al. 
5 (1984) Annu. Rev. Biochem. 53:323; Itakura et al. (1984) 
Science 198:1056; Ike et al. (1983) Nucleic Acid Res. 
11:477) . 

In addition, libraries of fragments of the coding 
sequence of a polypeptide of the invention can be used to 

10 generate a variegated population of polypeptides for 
screening and subsequent selection of variants. For 
example, a library of coding sequence fragments can be 
generated by treating a double stranded PCR fragment of 
the coding sequence of interest with a nuclease under 

15 conditions wherein nicking occurs only about once per 
molecule, denaturing the double stranded DNA, renaturing 
the DNA to form double stranded DNA which can include 
sense/ant isense pairs from different nicked products, 
removing single stranded portions from reformed duplexes 

20 by treatment with SI nuclease, and ligating the resulting 
fragment library into an expression vector. By this 
method, an expression library can be derived which 
encodes N- terminal and internal fragments of various 
sizes of the protein of interest. 

25 Several techniques are known in the art for screening 
gene products of combinatorial libraries made by point 
mutations or truncation, and for screening cDNA libraries 
for gene products having a selected property. The most 
widely used techniques, which are amenable to high 

30 through-put analysis, for screening large gene libraries 
typically include cloning the gene library into 
replicable expression vectors, transforming appropriate 
cells with the resulting library of vectors, and 
expressing the combinatorial genes under conditions in 

35 which detection of a desired activity facilitates 
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isolation of the vector encoding the gene whose product 
was detected. Recursive ensemble mutagenesis (REM) , a 
technique which enhances the frequency of functional 
mutants in the libraries, can be used in combination with 
5 the screening assays to identify variants of a protein of 
the invention (Arkin and Yourvan (1992) Proc. Natl. Acad. 
Sci. USA 89:7811-7815; Delgrave et al. (1993) Protein 
Engineering 6(3) : 327-331) • 

An isolated polypeptide of the invention, or a fragment 

10 thereof, can be used as an immunogen to generate 

antibodies using standard techniques for polyclonal and 
monoclonal antibody preparation. The full-length 
polypeptide or protein can be used or, alternatively, the 
invention provides antigenic peptide fragments for use as 

15 immunogens. The antigenic peptide of a protein of the 
invention comprises at least 8 (preferably 10, 15, 20, or 
30) amino acid residues of the amino acid sequence shown 

in any of SEQ ID Nos:23-33, 54-64, and - and 

encompasses an epitope of the protein such that an 

20 antibody raised against the peptide forms a specific 
immune complex with the protein. 

Preferred epitopes encompassed by the antigenic peptide 
are regions that are located on the surface of the 
protein, e.g., hydrophilic regions, rather than 

25 hydrophobic regions, e.g., transmembrane domains. The 
hydrophilicity of a protein sequence can be easily 
determined using readily available programs. 

An immunogen typically is used to prepare antibodies by 
immunizing a suitable subject, (e.g., rabbit, goat, mouse 

30 or other mammal) . An appropriate immunogenic preparation 
can contain, for example, recombinantly expressed 
chemically synthesized polypeptide. The preparation can 
further include an adjuvant, such as Freund's complete or 
incomplete adjuvant, or similar immunostimulatory agent. 
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Accordingly, another aspect of the invention pertains 
to antibodies directed against a polypeptide of the 
invention. The term "antibody" as used herein refers to 
immunoglobulin molecules and immunologically active 
5 portions of immunoglobulin molecules, i.e., molecules 
that contain an antigen binding site which specifically 
binds an antigen, such as a polypeptide of the invention. 
A molecule which specifically binds to a given 
polypeptide of the invention is a molecule which binds 

10 the polypeptide, but does not substantially bind other 
molecules in a sample, e.g., a biological sample, which 
naturally contains the polypeptide. Examples of 
immunologically active portions of immunoglobulin 
molecules include F(ab) and F(ab') 2 fragments which can be 

15 generated by treating the antibody with an enzyme such as 
pepsin. The invention provides polyclonal and monoclonal 
antibodies. The term "monoclonal antibody" or 
"monoclonal antibody composition" , as used herein, refers 
to a population of antibody molecules that contain only 

20 one species of an antigen binding site capable of 
immunoreacting with a particular epitope. 

Polyclonal antibodies can be prepared as described 
above by immunizing a suitable subject with a polypeptide 
of the invention as an immunogen. The antibody titer in 

25 the immunized subject can be monitored over time by 
standard techniques, such as with an enzyme linked 
immunosorbent assay (ELISA) using immobilized 
polypeptide. If desired, the antibody molecules can be 
isolated from the mammal (e.g., from the blood) and 

30 further purified by well-known techniques, such as 

protein A chromatography to obtain the IgG fraction. At 
an appropriate time after immunization, e.g., when the 
specific antibody titers are highest, antibody-producing 
cells can be obtained from the subject and used to 

35 prepare monoclonal antibodies by standard techniques, 
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such as the hybridoma technique originally described by 
Kohler and Milstein (1975) Nature 256:495-497, the human 
B cell hybridoma technique (Kozbor et al. (1983) Immunol. 
Today 4 : 72) , the EBV- hybridoma technique (Cole et al. 
5 (1985), Monoclonal Antibodies and Cancer Therapy, Alan R. 
Liss, Inc., pp. 77-96) or trioma techniques. The 
technology for producing hybridomas is well known (see 
generally Current Protocols in Immunology (1994) Coligan 
et al. (eds.) John Wiley & Sons, Inc., New York, NY). 

10 Hybridoma cells producing a monoclonal antibody of the 
invention are detected by screening the hybridoma culture 
supernatant s for antibodies that bind the polypeptide of 
interest, e.g., using a standard ELISA assay. 

Alternative to preparing monoclonal antibody-secreting 

15 hybridomas, a monoclonal antibody directed against a 
polypeptide of the invention can be identified and 
isolated by screening a recombinant combinatorial 
immunoglobulin library (e.g., an antibody phage display 
library) with the polypeptide of interest. Kits for 

20 generating and screening phage display libraries are 

commercially available (e.g., the Pharmacia Recombinant 
Phage Antibody System, Catalog No. 27-9400-01; and the 
Stratagene SurfZAP m Phage Display Kit, Catalog No. 
240612) . Additionally, examples of methods and reagents 

25 particularly amenable for use in generating and screening 
antibody display library can be found in, for example, 
U.S. Patent No. 5,223,409/ PCT Publication No. WO 
92/18619; PCT Publication No. WO 91/17271; PCT 
Publication No. WO 92/20791; PCT Publication No. WO 

30 92/15679; PCT Publication No. WO 93/01288; PCT 

Publication No. WO 92/01047; PCT Publication No. WO 
92/09690; PCT Publication No. WO 90/02809; Puchs et al. 
(1991) Bio/Technology 9:1370-1372; Hay et al. (1992) Hum. 
Antibod. Hybridomas 3:81-85; Huse et al . (1989) Science 
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246:1275-1281; Griffiths et al. (1993) EMBO J. 12:725- 
734. 

Additionally, recombinant antibodies, such as chimeric 
and humanized monoclonal antibodies, comprising both 
5 human and non-human portions, which can be made using 
standard recombinant DNA techniques, are within the scope 
of the invention. Such chimeric and humanized monoclonal 
antibodies can be produced by recombinant DNA techniques 
known in the art, for example using methods described in 

10 PCT Publication No. WO 87/02671; European Patent 

Application 184,187; European Patent Application 171,496; 
European Patent Application 173,494; PCT Publication No. 
WO 86/01533; U.S. Patent No. 4,816,567; European Patent 
Application 125,023; Better et al. (1988) Science 

15 240:1041-1043; Liu et al . (1987) Proc. Natl. Acad. Sci. 
USA 84:3439-3443; Liu et al . (1987) J*. Jmmunol . 
139:3521-3526; Sun et al . (1987) Proc. Natl. Acad. Sci. 
USA 84:214-218; Nishimura et al. (1987) Cane. Res. 
47:999-1005; Wood et al. (1985) Nature 314:446-449; and 

20 Shaw et al. (1988) J. Natl. Cancer Inst. 80:1553-1559); 
Morrison (1985) Science 229:1202-1207; Oi et al. (1986) 
Bio/Techniques 4:214; U.S. Patent 5,225,539; Jones et al. 
(1986) Nature 321:552-525; Verhoeyan et al. (1988) 
Science 239:1534; and Beidler et al. (1988) J. Immunol. 

25 141:4053-4060. 

Completely human antibodies are particularly desirable 
for therapeutic treatment of human patients. Such 
antibodies can be produced using transgenic mice which 
are incapable of expressing endogenous immunoglobulin 

30 heavy and light chains genes, but which can express human 
heavy and light chain genes. The transgenic mice are 
immunized in the normal fashion with a selected antigen, 
e.g., all or a portion of a polypeptide of the invention. 
Monoclonal antibodies directed against the antigen can be 

35 obtained using conventional hybridoma technology. The 
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human immunoglobulin transgenes harbored by the 
transgenic mice rearrange during B cell differentiation, 
and subsequently undergo class switching and somatic 
mutation. Thus, using such a technique, it is possible 
5 to produce therapeutically useful IgG, IgA and IgB 
antibodies. For an overview of this technology for 
producing human antibodies, see Lonberg and Huszar (1995, 
Znt. Rev. Immunol. 13:65-93). For a detailed discussion 
of this technology for producing human antibodies and 

10 human monoclonal antibodies and protocols for producing 
such antibodies, see, e.g., U.S. Patent 5,625,126; U.S. 
Patent 5,633,425; U.S. Patent 5,569,825; U.S. Patent 
5,661,016; and U.S. Patent 5,545,806. In addition, 
companies such as Abgenix, Inc. (Freemont, CA) , can be 

15 engaged to provide human antibodies directed against a 
selected antigen using technology similar to that 
described above. 

Completely human antibodies which recognize a selected 
epitope can be generated using a technique referred to as 

20 "guided selection." In this approach a selected 

non-human monoclonal antibody, e.g., a murine antibody, 
is used to guide the selection of a completely human 
antibody recognizing the same epitope. 

An antibody directed against a polypeptide of the 

25 invention (e.g., monoclonal antibody) can be used to 
isolate the polypeptide by standard techniques, such as 
affinity chromatography or immunoprecipitation. 
Moreover, such an antibody can be used to detect the 
protein (e.g., in a cellular lysate or cell supernatant) 

30 in order to evaluate the abundance and pattern of 

expression of the polypeptide. The antibodies can also 
be used diagnostically to monitor protein levels in 
tissue as part of a clinical testing procedure, e.g., to, 
for example, determine the efficacy of a given treatment 

35 regimen. Detection can be facilitated by coupling the 
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antibody to a detectable substance. Examples of 
detectable substances include various enzymes , prosthetic 
groups, fluorescent materials, luminescent materials, 
bioluminescent materials, and radioactive materials. 
5 Examples of suitable enzymes include horseradish 

peroxidase, alkaline phosphatase, 0-galactosidase, or 
acetylcholinesterase; examples of suitable prosthetic 
group complexes include streptavidin/biotin and 
avidin/biotin; examples of suitable fluorescent materials 

10 include umbellif erone, fluorescein, fluorescein 
isothiocyanate , rhodamine , dichlorot riazinylamine 
fluorescein, dansyl chloride or phycoerythrin; an example 
of a luminescent material includes luminol; examples of 
bioluminescent materials include lucif erase, luciferin, 

15 and aeguorin, and examples of suitable radioactive 
material include 125 I, 13l I, 35 S or 3 H. 

Ill, Recombinant Expression Vectors and Host Cells 

Another aspect of the invention pertains to vectors, 
preferably expression vectors, containing a nucleic acid 

20 encoding a polypeptide of the invention (or a portion 
thereof) . As used herein, the term "vector" refers to a 
nucleic acid molecule capable of transporting another 
nucleic acid to which it has been linked. One type of 
vector is a "plasmid 11 , which refers to a circular double 

25 stranded DNA loop into which additional DNA segments can 
be ligated. Another type of vector is a viral vector, 
wherein additional DNA segments can be ligated into the 
viral genome. Certain vectors are capable of autonomous 
replication in a host cell into which they are introduced 

30 (e.g., bacterial vectors having a bacterial origin of 
replication and episomal mammalian vectors) . Other 
vectors (e.g., non-episomal mammalian vectors) are 
integrated into the genome of a host cell upon 
introduction into the host cell, and thereby are 
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replicated along with the host genome. Moreover, certain 
vectors, expression vectors, are capable of directing the 
expression of genes to which they are operably linked. 
In general, expression vectors of utility in recombinant 
5 DNA techniques are often in the form of plasmids 

(vectors) . However, the invention is intended to include 
such other forms of expression vectors, such as viral 
vectors (e.g., replication defective retroviruses, 
adenoviruses and adeno-associated viruses) , which serve 

10 equivalent functions. 

The recombinant expression vectors of the invention 
comprise a nucleic acid of the invention in a form 
suitable for expression of the nucleic acid in a host 
cell. This means that the recombinant expression vectors 

15 include one or more regulatory sequences, selected on the 
basis of the host cells to be used for expression, which 
is operably linked to the nucleic acid sequence to be 
expressed. Within a recombinant expression vector, 
"operably linked" is intended to mean that the nucleotide 

20 sequence of interest is linked to the regulatory 

sequence (s) in a manner which allows for expression of 
the nucleotide sequence (e.g., in an in vitro 
transcription/translation system or in a host cell when 
the vector is introduced into the host cell) . The term 

25 "regulatory sequence" is intended to include promoters, 
enhancers and other expression control elements (e.g., 
polyadenylation signals) . Such regulatory sequences are 
described, for example, in Goeddel, Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, 

30 San Diego, CA (1990) . Regulatory sequences include those 
which direct constitutive expression of a nucleotide 
sequence in many types of host cell and those which 
direct expression of the nucleotide sequence only in 
certain host cells (e.g., tissue-specific regulatory 

35 sequences) . It will be appreciated by those skilled in 
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the art that the design of the expression vector can 
depend on such factors as the choice of the host cell to 
be transformed, the level of expression of protein 
desired, etc. The expression vectors of the invention 
5 can be introduced into host cells to thereby produce 
proteins or peptides, including fusion proteins or 
peptides, encoded by nucleic acids as described herein. 

The recombinant expression vectors of the invention can 
be designed for expression of a polypeptide of the 

10 invention in prokaryotic or eukaryotic cells, e.g., 
bacterial cells such as E. coli, insect cells (using 
baculovirus expression vectors) , yeast cells or mammalian 
cells. Suitable host cells are discussed further in 
Goeddel, supra. Alternatively, the recombinant 

15 expression vector can be transcribed and translated in 
vitro, for example using T7 promoter regulatory sequences 
and T7 polymerase. 

Expression of proteins in prokaryotes is most often 
carried out in E. coli with vectors containing 

20 constitutive or inducible promoters directing the 
expression of either fusion or non- fusion proteins. 
Fusion vectors add a number of amino acids to a protein 
encoded therein, usually to the amino terminus of the 
recombinant protein. Such fusion vectors typically serve 

25 three purposes: 1) to increase expression of recombinant 
protein; 2) to increase the solubility of the recombinant 
protein; and 3) to aid in the purification of the 
recombinant protein by acting as a ligand in affinity 
purification. Often, in fusion expression vectors, a 

30 proteolytic cleavage site is introduced at the junction 
of the fusion moiety and the recombinant protein to 
enable separation of the recombinant protein from the 
fusion moiety subsequent to purification of the fusion 
protein. Such enzymes, and their cognate recognition 

35 sequences, include Factor Xa, thrombin and enterokinase . 
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Typical fusion expression vectors include pGEX (Pharmacia 
Biotech Inc; Smith and Johnson (1988) Gene 67:31-40), 
pMAL (New England Biolabs, Beverly, MA) and pRIT5 
(Pharmacia, Piscataway, NJ) which fuse glutathione S- 
5 transferase (GST) , maltose E binding protein, or protein 
A, respectively, to the target recombinant protein. 

Examples of suitable inducible non- fusion B. coli 
expression vectors include pTrc (Amann et al., (1988) 
Gene 69:301-315) and pET lid (Studier et al . , Gene 

10 Expression Technology: Methods in Enzymology 185, 
Academic Press, San Diego, California (1990) 60-89) . 
Target gene expression from the pTrc vector relies on 
host RNA polymerase transcription from a hybrid trp-lac 
fusion promoter. Target gene expression from the pET lid 

15 vector relies on transcription from a T7 gnlO-lac fusion 
promoter mediated by a coexpressed viral RNA polymerase 
(T7 gnl) . This viral polymerase is supplied by host 
strains BL21(DE3) or HMS174(DE3) from a resident X 
prophage harboring a T7 gnl gene under the 

20 transcriptional control of the lacUV 5 promoter. 

One strategy to maximize recombinant protein expression 
in E. coli is to express the protein in a host bacteria 
with an impaired capacity to proteolytically cleave the 
recombinant protein (Gottesman, Gene Expression 

25 Technology: Methods in Enzymology 185, Academic Press, 
San Diego, California (1990) 119-128) . Another strategy 
is to alter the nucleic acid sequence of the nucleic acid 
to be inserted into an expression vector so that the 
individual codons for each amino acid are those 

30 preferentially utilized in E. coli (Wada et al. (1992) 
Nucleic Acids Res. 20:2111-2118) . Such alteration of 
nucleic acid sequences of the invention can be carried 
out by standard DNA synthesis techniques. 

In another embodiment, the expression vector is a yeast 

35 expression vector. Examples of vectors for expression in 
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yeast S. cerivisae include pYepSecl (Baldari et al. 
(1987) EMBO J. 6:229-234), pMFa (Kurjan and Herskowitz, 
(1982) Cell 30:933-943), pJRY88 (Schultz et al. (1987) 
Gene 54:113-123), pYES2 (Invitrogen Corporation, San 
5 Diego, CA) , and pPicZ (Invitrogen Corp, San Diego, CA) . 
Alternatively, the expression vector is a baculovirus 
expression vector. Baculovirus vectors available for 
expression of proteins in cultured insect cells (e.g., Sf 
9 cells) include the pAc series (Smith et al. (1983) Mol. 

10 Cell Biol. 3:2156-2165) and the pVL series (Lucklow and 
Summers (1989) Virology 170 : 31-39) . 

In yet another embodiment, a nucleic acid of the 
invention is expressed in mammalian cells using a 
mammalian expression vector. Examples of mammalian 

15 expression vectors include pCDM8 (Seed (1987) Nature 

329:840) and pMT2PC (Kaufman et al. (1987) EMBO J. 6:187- 
195) . When used in mammalian cells, the expression 
vector's control functions are often provided by viral 
regulatory elements. For example, commonly used 

20 promoters are derived from polyoma, Adenovirus 2, 

cytomegalovirus and Simian Virus 40. For other suitable 
expression systems for both prokaryotic and eukaryotic 
cells see chapters 16 and 17 of Sambrook et al., supra. 
In another embodiment, the recombinant mammalian 

25 expression vector is capable of directing expression of 
the nucleic acid preferentially in a particular cell type 
(e.g., tissue-specific regulatory elements are used to 
express the nucleic acid) . Tissue- specif ic regulatory 
elements are known in the art. Non-limiting examples of 

30 suitable tissue-specific promoters include the albumin 
promoter (liver-specific; Pinkert et al. (1987) Genes 
Dev. 1:268-277), lymphoid-specif ic promoters (Calame and 
Eaton (1988) Adv. Immunol. 43:235-275), in particular 
promoters of T cell receptors (Winoto and Baltimore 

35 (1989) EMBO J. 8:729-733) and immunoglobulins (Banerji et 
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al. (1983) Cell 33:729-740; Queen and Baltimore (1983) 
Cell 33:741-748), neuron- specific promoters (e.g., the 
neurofilament promoter; Byrne and Ruddle (1989) Proc. 
Natl. Acad. Sci. USA 86:5473-5477), pancreas-specific 
5 promoters (Edlund et al. (1985) Science 230:912-916), and 
mammary gland-specific promoters (e.g., milk whey 
promoter; U.S. Patent No. 4,873,316 and European 
Application Publication No. 264,166). Developmental ly- 
regulated promoters are also encompassed, for example the 

10 murine hox promoters (Kessel and Gruss (1990) Science 
249:374-379) and the a-f etoprotein promoter (Campes and 
Tilghman (1989) Genea Dev. 3:537-546). 

The invention further provides a recombinant expression 
vector comprising a DNA molecule of the invention cloned 

15 into the expression vector in an antisense orientation. 
That is, the DNA molecule is operably linked to a 
regulatory sequence in a manner which allows for 
expression (by transcription of the DNA molecule) of an 
RNA molecule which is antisense to the mRNA encoding a 

20 polypeptide of the invention. Regulatory sequences 

operably linked to a nucleic acid cloned in the antisense 
orientation can be chosen which direct the continuous 
expression of the antisense RNA molecule in a variety of 
cell types, for instance viral promoters and/or 

25 enhancers, or regulatory sequences can be chosen which 
direct constitutive, tissue specific or cell type 
specific expression of antisense RNA. The antisense 
expression vector can be in the form of a recombinant 
plasmid, phagemid or attenuated virus in which antisense 

30 nucleic acids are produced under the control of a high 
efficiency regulatory region, the activity of which can 
be determined by the cell type into which the vector is 
introduced. For a discussion of the regulation of gene 
expression using antisense genes see Weintraub et al. 

35 (Reviews - Trends in Genetics, Vol. 1(1) 1986). 
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Another aspect of the invention pertains to host cells 
into which a recombinant expression vector of the 
invention has been introduced. The terms "host cell" and 
"recombinant host cell" are used interchangeably herein, 
5 It is understood that such terms refer not only to the 
particular subject cell but to the progeny or potential 
progeny of such a cell. Because certain modifications 
may occur in succeeding generations due to either 
mutation or environmental influences, such progeny may 
10 not, in fact, be identical to the parent cell, but are 
still included within the scope of the term as used 
herein. 

A host cell can be any prokaryotic (e.g., E. coli) or 
eukaryotic (e.g., an insect cell, a yeast cell or a 

15 mammalian cell) cell. 

Vector DNA can be introduced into prokaryotic or 
eukaryotic cells via conventional transformation or 
transfection techniques. As used herein, the terms 
"transformation" and "transfection" are intended to refer 

20 to a variety of art -recognized techniques for introducing 
foreign nucleic acid into a host cell, including calcium 
phosphate or calcium chloride co-precipitation, DEAE- 
dextran-mediated transfection, lipofection, or 
electroporation. Suitable methods for transforming or 

25 transfecting host cells can be found in Sambrook, et al . 
(supra), and other laboratory manuals. 

For stable transfection of mammalian cells, it is known 
that, depending upon the expression vector and 
transfection technique used, only a small fraction of 

30 cells may integrate the foreign DNA into their genome. 
In order to identify and select these integrants, a gene 
that encodes a selectable marker (e.g., for resistance to 
antibiotics) is generally introduced into the host cells 
along with the gene of interest. Preferred selectable 

35 markers include those which confer resistance to drugs, 
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such as G418, hygromycin and methotrexate. Cells stably 
transfected with the introduced nucleic acid can be 
identified by drug selection (e.g., cells that have 
incorporated the selectable marker gene will survive, 
5 while the other cells die) . 

A host cell of the invention, such as a prokaryotic or 
eukaryotic host cell in culture, can be used to produce a 
polypeptide of the invention. Accordingly, the invention 
further provides methods for producing a polypeptide of 

10 the invention using the host cells of the invention. In 
one embodiment, the method comprises culturing the host 
cell of invention (into which a recombinant expression 
vector encoding a polypeptide of the invention has been 
introduced) in a suitable medium such that the 

15 polypeptide is produced. In another embodiment, the 
method further comprises isolating the polypeptide from 
the medium or the host cell. 

The host cells of the invention can also be used to 
produce nonhuman transgenic animals. For example, in one 

20 embodiment, a host cell of the invention is a fertilized 
oocyte or an embryonic stem cell into which a sequences 
encoding a polypeptide of the invention have been 
introduced. Such host cells can then be used to create 
non-human transgenic animals in which exogenous sequences 

25 encoding a polypeptide of the invention have been 

introduced into their genome or homologous recombinant 
animals in which endogenous encoding a polypeptide of the 
invention sequences have been altered. Such animals are 
useful for studying the function and/or activity of the 

30 polypeptide and for identifying and/or evaluating 

modulators of polypeptide activity. As used herein, a 
"transgenic animal" is a non-human animal, preferably a 
mammal, more preferably a rodent such as a rat or mouse, 
in which one or more of the cells of the animal includes 

35 a transgene. Other examples of transgenic animals 
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include non-human primates, sheep, dogs, cows, goats, 
chickens, amphibians, etc. A transgene is exogenous DNA 
which is integrated into the genome of a cell from which 
a transgenic animal develops and which remains in the 
5 genome of the mature animal, thereby directing the 

expression of an encoded gene product in one or more cell 
types or tissues of the transgenic animal. As used 
herein, an "homologous recombinant animal" is a non-human 
animal, preferably a mammal, more preferably a mouse, in 

10 which an endogenous gene has been altered by homologous 
recombination between the endogenous gene and an 
exogenous DNA molecule introduced into a cell of the 
animal, e.g., an embryonic cell of the animal, prior to 
development of the animal. 

15 A transgenic animal of the invention can be created by 
introducing nucleic acid encoding a polypeptide of the 
invention (or a homologue thereof) into the male 
pronuclei of a fertilized oocyte, e.g., by 
microinjection, retroviral infection, and allowing the 

20 oocyte to develop in a pseudopregnant female foster 

animal. Intronic sequences and polyadenylation signals 
can also be included in the transgene to increase the 
efficiency of expression of the transgene. A tissue- 
specific regulatory sequence (s) can be operably linked to 

25 the transgene to direct expression of the polypeptide of 
the invention to particular cells. Methods for 
generating transgenic animals via embryo manipulation and 
microinjection, particularly animals such as mice, have 
become conventional in the art and are described, for 

30 example, in U.S. Patent NOS. 4,736,866 and 4,870,009, 
U.S. Patent No. 4,873,191 and in Hogan, Manipulating the 
Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, N.Y., 1986). Similar methods are used for 
production of other transgenic animals. A transgenic 

35 founder animal can be identified based upon the presence 
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of the transgene in its genome and/or expression of mRNA 
encoding the transgene in tissues or cells of the 
animals. A transgenic founder animal can then be used to 
breed additional animals carrying the transgene. 
5 Moreover, transgenic animals carrying the transgene can 
further be bred to other transgenic animals carrying 
other transgenes. 

To create an homologous recombinant animal, a vector is 
prepared which contains at least a portion of a gene 

10 encoding a polypeptide of the invention into which a 

deletion, addition or substitution has been introduced to 
thereby alter, e.g., functionally disrupt, the gene. In 
a preferred embodiment, the vector is designed such that, 
upon homologous recombination, the endogenous gene is 

15 functionally disrupted (i.e., no longer encodes a 

functional protein; also referred to as a "knock out" 
vector) . Alternatively, the vector can be designed such 
that, upon homologous recombination, the endogenous gene 
is mutated or otherwise altered but still encodes 

20 functional protein (e.g., the upstream regulatory region 
can be altered to thereby alter the expression of the 
endogenous protein) . In the homologous recombination 
vector, the altered portion of the gene is flanked at its 
5' and 3' ends by additional nucleic acid of the gene to 

25 allow for homologous recombination to occur between the 
exogenous gene carried by the vector and an endogenous 
gene in an embryonic stem cell. The additional flanking 
nucleic acid sequences are of sufficient length for 
successful homologous recombination with the endogenous 

30 gene. Typically, several kilobases of flanking DNA (both 
at the 5' and 3' ends) are included in the vector (see, 
e.g., Thomas and Capecchi (1987) Cell 51:503 for a 
description of homologous recombination vectors) . The 
vector is introduced into an embryonic stem cell line 

35 (e.g., by electroporation) and cells in which the 
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introduced gene has homologously recombined with the 
endogenous gene are selected (see, e.g., Li et al. (1992) 
Cell 69:915). The selected cells are then injected into 
a blastocyst of an animal (e.g., a mouse) to form 
5 aggregation chimeras (see, e.g., Bradley in 

Teratocarcinomas and Embryonic Stem Cells: A Practical 
Approach, Robertson, ed. (IRL, Oxford, 1987) pp. 113- 
152) . A chimeric embryo can then be implanted into a 
suitable pseudopregnant female foster animal and the 

10 embryo brought to term. Progeny harboring the 

homologously recombined DNA in their germ cells can be 
used to breed animals in which all cells of the animal 
contain the homologously recombined DNA by germline 
transmission of the transgene. Methods for constructing 

15 homologous recombination vectors and homologous 

recombinant animals are described further ?.n Bradley 
(1991) Current Opinion in Bio/Technology 2:823-829 and in 
PCT Publication NOS. WO 90/11354, WO 91/01140, WO 
92/0968, and WO 93/04169. 

20 In another embodiment, transgenic non-human animals can 
be produced which contain selected systems which allow 
for regulated expression of the transgene. One example 
of such a system is the cre/loxP recombinase system of 
bacteriophage PI. For a description of the cre/loxP 

25 recombinase system, see, e.g., Lakso et al. (1992) Proc. 
Natl. Acad. Sci. USA 89:6232-6236. Another example of a 
recombinase system is the PLP recombinase system of 
Saccharomyces cerevisiae (O' Gorman et al. (1991) Science 
251:1351-1355. If a cre/loxP recombinase system is used 

30 to regulate expression of the transgene, animals 

containing transgenes encoding both the Cre recombinase 
and a selected protein are required. Such animals can be 
provided through the construction of "double" transgenic 
animals, e.g., by mating two transgenic animals, one 
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containing a transgene encoding a selected protein and 
the other containing a transgene encoding a recombinase. 

Clones of the non-human transgenic animals described 
herein can also be produced according to the methods 
5 described in Wilmut et al. (1997) Nature 385:810-813 and 
PCT Publication NOS. WO 97/07668 and WO 97/07669. 

IV. Pharmaceutical Compositions 

The nucleic acid molecules, polypeptides, and 
antibodies (also referred to herein as "active 

10 compounds " ) of the invention can be incorporated into 
pharmaceutical compositions suitable for administration. 
Such compositions typically comprise the nucleic acid 
molecule, protein, or antibody and a pharmaceutical ly 
acceptable carrier. As used herein the language 

15 "pharmaceutically acceptable carrier" is intended to 

include any and all solvents, dispersion media, coatings, 
antibacterial and antifungal agents, isotonic and 
absorption delaying agents, and the like, compatible with 
pharmaceutical administration. The use of such media and 

20 agents for pharmaceutically active substances is well 
known in the art. Except insofar as any conventional 
media or agent is incompatible with the active compound, 
use thereof in the compositions is contemplated. 
Supplementary active compounds can also be incorporated 

25 into the compositions. 

The invention includes methods for preparing 
pharmaceutical compositions for modulating the expression 
or activity of a polypeptide or nucleic acid of the 
invention. Such methods comprise formulating a 

30 pharmaceutically acceptable carrier with an agent which 
modulates expression or activity of a polypeptide or 
nucleic acid of the invention. Such compositions can 
further include additionl active agents. Thus, the 
invention further includes methods for preparing a 
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pharmaceutical composition by formulating a 
pharmaceutical ly acceptable carrier with an agent which 
modulates expression or activity of a polypeptide or 
nucleic acid of the invention and one or more addtional 
5 active compounds. 

A pharmaceutical composition of the invention is 
formulated to be compatible with its intended route of 
administration. Examples of routes of administration 
include parenteral , e.g., intravenous , intradermal , 

10 subcutaneous, oral (e.g., inhalation), transdermal 
(topical), transmucosal , and rectal administration. 
Solutions or suspensions used for parenteral, 
intradermal, or subcutaneous application can include the 
following components: a sterile diluent such as water for 

15 injection, saline solution, fixed oils, polyethylene 
glycols, glycerine, propylene glycol or other synthetic 
solvents; antibacterial agents such as benzyl alcohol or 
methyl parabens; antioxidants such as ascorbic acid or 
sodium bisulfite; chelating agents such as 

20 ethylenediaminetetraacetic acid; buffers such as 

acetates, citrates or phosphates and agents for the 
adjustment of tonicity such as sodium chloride or 
dextrose. pH can be adjusted with acids or bases, such 
as hydrochloric acid or sodium hydroxide. The parenteral 

25 preparation can be enclosed in ampoules, disposable 

syringes or multiple dose vials made of glass or plastic. 

Pharmaceutical compositions suitable for injectable use 
include sterile aqueous solutions (where water soluble) 
or dispersions and sterile powders for the extemporaneous 
30 preparation of sterile injectable solutions or 

dispersions. For intravenous administration, suitable 
carriers include physiological saline, bacteriostatic 
water, Cremophor EL m (BASF; Parsippany, NJ) or phosphate 
buffered saline (PBS) . In all cases, the composition 
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must be sterile and should be fluid to the extent that 
easy syringability exists. It must be stable under the 
conditions of manufacture and storage and must be 
preserved against the contaminating action of 
5 microorganisms such as bacteria and fungi. The carrier 
can be a solvent or dispersion medium containing, for 
example, water, ethanol, polyol (for example, glycerol, 
propylene glycol, and liquid polyetheylene glycol, and 
the like), and suitable mixtures thereof. The proper 

10 fluidity can be maintained, for example, by the use of a 
coating such as lecithin, by the maintenance of the 
required particle size in the case of dispersion and by 
the use of surfactants. Prevention of the action of 
microorganisms can be achieved by various antibacterial 

15 and antifungal agents, for example, parabens, 

chlorobutanol , phenol, ascorbic acid, thimerosal, and the 
like. In many cases, it will be preferable to include 
isotonic agents, for example, sugars, polyalcohols such 
as mannitol, sorbitol, sodium chloride in the 

20 composition. Prolonged absorption of the injectable 
compositions can be brought about by including in the 
composition an agent which delays absorption, for 
example, aluminum monostearate and gelatin. 

Sterile injectable solutions can be prepared by 

25 incorporating the active compound (e.g., a polypeptide or 
antibody) in the required amount in an appropriate 
solvent with one or a combination of ingredients 
enumerated above, as required, followed by filtered 
sterilization. Generally, dispersions are prepared by 

30 incorporating the active compound into a sterile vehicle 
which contains a basic dispersion medium and the required 
other ingredients from those enumerated above. In the 
case of sterile powders for the preparation of sterile 
injectable solutions, the preferred methods of 

35 preparation are vacuum drying and freeze-drying which 
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yields a powder of the active ingredient plus any 
additional desired ingredient from a previously sterile - 
filtered solution thereof. 

Oral compositions generally include an inert diluent or 
5 an edible carrier. They can be enclosed in gelatin 

capsules or compressed into tablets. For the purpose of 
oral therapeutic administration, the active compound can 
be incorporated with excipients and used in the form of 
tablets, troches, or capsules. Oral compositions can 

10 also be prepared using a fluid carrier for use as a 

mouthwash, wherein the compound in the fluid carrier is 
applied orally and swished and expectorated or swallowed. 
Pharmaceutical ly compatible binding agents, and/or 
adjuvant materials can be included as part of the 

15 composition. The tablets, pills, capsules, troches and 
the like can contain any of the following ingredients, or 
compounds of a similar nature: a binder such as 
microcrystalline cellulose, gum tragacanth or gelatin; an 
excipient such as starch or lactose, a disintegrating 

20 agent such as alginic acid, Primogel, or corn starch; a 
lubricant such as magnesium stearate or Sterotes; a 
glidant such as colloidal silicon dioxide; a sweetening 
agent such as sucrose or saccharin; or a flavoring agent 
such as peppermint, methyl salicylate, or orange 

25 flavoring. 

For administration by inhalation, the compounds are 
delivered in the form of an aerosol spray from a 
pressurized container or dispenser which contains a 
suitable propellant, e.g., a gas such as carbon dioxide, 

30 or a nebulizer. 

Systemic administration can also be by transmucosal or 
transdermal means. For transmucosal or transdermal 
administration, penetrants appropriate to the barrier to 
be permeated are used in the formulation. Such 

35 penetrants are generally known in the art, and include, 
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for example, for transmucosal administration, detergents, 
bile salts, and fusidic acid derivatives. Transmucosal 
administration can be accomplished through the use of 
nasal sprays or suppositories. For transdermal 
5 administration, the active compounds are formulated into 
ointments, salves, gels, or creams as generally known in 
the art. 

The compounds can also be prepared in the form of 
suppositories (e.g., with conventional suppository bases 

10 such as cocoa butter and other glycerides) or retention 
enemas for rectal delivery. 

In one embodiment, the active compounds are prepared 
with carriers that will protect the compound against 
rapid elimination from the body, such as a controlled 

15 release formulation, including implants and 

microencapsulated delivery systems. Biodegradable, 
biocompatible polymers can be used, such as ethylene 
vinyl acetate, polyanhydrides, polyglycolic acid, 
collagen, polyorthoesters , and polylactic acid. Methods 

20 for preparation of such formulations will be apparent to 
those skilled in the art. The materials can also be 
obtained commercially from Alza Corporation and Nova 
Pharmaceuticals, Inc. Liposomal suspensions (including 
liposomes targeted to infected cells with monoclonal 

25 antibodies to viral antigens) can also be used as 
pharmaceutical ly acceptable carriers. These can be 
prepared according to methods known to those skilled in 
the art, for example, as described in U.S. Patent No. 
4,522,811. 

30 It is especially advantageous to formulate oral or 
parenteral compositions in dosage unit form for ease of 
administration and uniformity of dosage. Dosage unit 
form as used herein refers to physically discrete units 
suited as unitary dosages for the subject to be treated; 

35 each unit containing a predetermined quantity of active 
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compound calculated to produce the desired therapeutic 
effect in association with the required pharmaceutical 
carrier. The specification for the dosage unit forms of 
the invention are dictated by and directly dependent on 
5 the unique characteristics of the active compound and the 
particular therapeutic effect to be achieved, and the 
limitations inherent in the art of compounding such an 
active compound for the treatment of individuals. 

For antibodies, the preferred dosage is 0.1 mg/kg to 

10 100 mg/kg of body weight (generally 10 mg/kg to 20 

mg/kg) . If the antibody is to act in the brain, a dosage 
of 50 mg/kg to 100 mg/kg is usually appropriate. 
Generally, partially human antibodies and fully human 
antibodies have a longer half -life within the human body 

15 than other antibodies. Accordingly, lower dosages and 
less frequent administration is often possible. 
Modifications such as lipidation can be used to stabilize 
antibodies and to enhance uptake and tissue penetration 
(e.g., into the brain). A method for lipidation of 

20 antibodies is described by Cruikshank et al. ((1997) J. 
Acguired Immune deficiency Syndromes and Human 
Retrovirology 14:193) . 

The nucleic acid molecules of the invention can be 
inserted into vectors and used as gene therapy vectors. 

25 Gene therapy vectors can be delivered to a subject by, 
for example, intravenous injection, local administration 
(U.S. Patent 5,328,470) or by stereotactic injection 
(see, e.g., Chen et al. (1994) Proc. Natl. Acad. Sci. USA 
91:3054-3057). The pharmaceutical preparation of the 

30 gene therapy vector can include the gene therapy vector 
in an acceptable diluent, or can comprise a slow release 
matrix in which the gene delivery vehicle is imbedded. 
Alternatively, where the complete gene delivery vector 
can be produced intact from recombinant cells, e.g. 

35 retroviral vectors, the pharmaceutical preparation can 
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include one or more cells which produce the gene delivery 
system. 

The pharmaceutical compositions can be included in a 
container, pack, or dispenser together with instructions 
5 for administration. 

V. Uses and Methods of the Invention 

The nucleic acid molecules, proteins, protein 
homologues, and antibodies described herein can be used 
in one or more of the following methods: a) screening 

10 assays; b) detection assays (e.g., chromosomal mapping, 
tissue typing, forensic biology) ; c) predictive medicine 
(e.g., diagnostic assays, prognostic assays, monitoring 
clinical trials, and pharmacogenomics) ; and d) methods of 
treatment (e.g., therapeutic and prophylactic). For 

15 example, polypeptides of the invention can to used to (i) 
modulate cellular proliferation; (ii) modulate cellular 
differentiation; and (iii) modulate cell survival. The 
isolated nucleic acid molecules of the invention can be 
used to express proteins (e.g., via a recombinant 

20 expression vector in a host cell in gene therapy 

applications), to detect mRNA (e.g., in a biological 
sample) or a genetic lesion, and to modulate activity of 
a polypeptide of the invention. In addition, the 
polypeptides of the invention can be used to screen drugs 

25 or compounds which modulate activity or expression of a 
polypeptide of the invention as well as to treat 
disorders characterized by insufficient or excessive 
production of a protein of the invention or production of 
a form of a protein of the invention which has decreased 

30 or aberrant activity compared to the wild type protein. 
In addition, the antibodies of the invention can be used 
to detect and isolate a protein of the invention and 
modulate activity of a protein of the invention. 
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This invention further pertains to novel agents 
identified by the above-described screening assays and 
uses thereof for treatments as described herein. 



A. Screening Assays 
5 The invention provides a method (also referred to 
herein as a "screening assay") for identifying 
modulators, i.e., candidate or test compounds or agents 
(e.g., peptides, peptidomimetics, small molecules or 
other drugs) which bind to polypeptide of the invention 
10 or have a stimulatory or inhibitory effect on, for 

example, expression or activity of a polypeptide of the 
invention. 

In one embodiment, the invention provides assays for 
screening candidate or test compounds which bind to or 

15 modulate the activity of the membrane -bound form of a 
polypeptide of the invention or biologically active 
portion thereof. The test compounds of the present 
invention can be obtained using any of the numerous 
approaches in combinatorial library methods known in the 

20 art, including: biological libraries; spatially 
addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring 
deconvolution; the "one-bead one -compound" library 
method; and synthetic library methods using affinity 

25 chromatography selection. The biological library 

approach is limited to peptide libraries, while the other 
four approaches are applicable to peptide, non-peptide 
oligomer or small molecule libraries of compounds (Lam 
(1997) Anticancer Drug Des. 12:145). 

30 Examples of methods for the synthesis of molecular 
libraries can be found in the art, for example in: 
DeWitt et al. (1993) Proc. Natl. Acad. Sci. USA 90:6909; 
Erb et al. (1994) Proc. Natl. Acad. Sci. USA 91:11422; 
Zuckermann et al. (1994). J . Med. Chem. 37:2678; Cho et 
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al. (1993) Science 261:1303; Carrell et al. (1994) Angew. 
Chem. Int. Ed. Engl. 33:2059; Carell et al. (1994) Angew. 
Chem. Int. Ed. Engl. 33:2061; and Gallop et al. (1994) J. 
Med. Chem. 37:1233. 
5 Libraries of compounds may be presented in solution 
(e.g., Houghten (1992) Bio/Techniques 13:412-421), or on 
beads (Lam (1991) Nature 354 :82-84) , chips (Fodor (1993) 
Nature 364:555-556), bacteria (U.S. Patent No. 
5,223,409), spores (Patent NOS. 5,571,698; 5,403,484; and 

10 5,223,409), plasmids (Cull et al. (1992) Proc. Natl. 
Acad. Sci. USA 89:1865-1869) or phage (Scott and Smith 
(1990) Science 249:386-390; Devlin (1990) Science 
249:404-406; Cwirla et al. (1990) Proc. Natl. Acad. Sci. 
USA 87:6378-6382; and Felici (1991) J. Mol. Biol. 

15 222:301-310). 

In one embodiment, an assay is a cell-based assay in 
which a cell which expresses a membrane -bound form of a 
polypeptide of the invention, or a biologically active 
portion thereof, on the cell surface is contacted with a 

20 test compound and the ability of the test compound to 
bind to the polypeptide determined. The cell, for 
example, can be a yeast cell or a cell of mammalian 
origin. Determining the ability of the test compound to 
bind to the polypeptide can be accomplished, for example, 

25 by coupling the test compound with a radioisotope or 

enzymatic label such that binding of the test compound to 
the polypeptide or biologically active portion thereof 
can be determined by detecting the labeled compound in a 
complex. For example, test compounds can be labeled with 

30 12S I, 35 S, 14 C, or 3 H, either directly or indirectly, and 
the radioisotope detected by direct counting of 
radioemmission or by scintillation counting. 
Alternatively, test compounds can be enzymatically 
labeled with, for example, horseradish peroxidase, 

35 alkaline phosphatase, or lucif erase, and the enzymatic 
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label detected by determination of conversion of an 
appropriate substrate to product. In a preferred 
embodiment, the assay comprises contacting a cell which 
expresses a membrane -bound form of a polypeptide of the 
5 invention, or a biologically active portion thereof, on 
the cell surface with a known compound which binds the 
polypeptide to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the 
ability of the test compound to interact with the 
10 polypeptide, wherein determining the ability of the test 
compound to interact with the polypeptide comprises 
determining the ability of the test compound to 
preferentially bind to the polypeptide or a biologically 
active portion thereof as compared to the known compound. 

15 In another embodiment, am assay is a cell-based assay 
comprising contacting a cell expressing a membrane -bound 
form of a polypeptide of the invention, or a biologically 
active portion thereof, on the cell surface with a test 
compound and determining the ability of the test compound 

20 to modulate (e.g., stimulate or inhibit) the activity of 
the polypeptide or biologically active portion thereof. 
Determining the ability of the test compound to modulate 
the activity of the polypeptide or a biologically active 
portion thereof can be accomplished, for example, by 

25 determining the ability of the polypeptide protein to 
bind to or interact with a target molecule. 

Determining the ability of a polypeptide of the 
invention to bind to or interact with a target molecule 
can be accomplished by one of the methods described above 

30 for determining direct binding. As used herein, a 

"target molecule" is a molecule with which a selected 
polypeptide (e.g., a polypeptide of the invention binds 
or interacts with in nature, for example, a molecule on 
the surface of a cell which expresses the selected 
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protein, a molecule on the surface of a second cell, a 
molecule in the extracellular milieu, a molecule 
associated with the internal surface of a cell membrane 
or a cytoplasmic molecule. A target molecule can be a 
5 polypeptide of the invention or some other polypeptide or 
protein. For example, a target molecule can be a 
component of a signal transduction pathway which 
facilitates transduction of an extracellular signal 
(e.g., a signal generated by binding of a compound to a 

10 polypeptide of the invention) through the cell membrane 
and into the cell or a second intercellular protein which 
has catalytic activity or a protein which facilitates the 
association of downstream signaling molecules with a 
polypeptide of the invention. Determining the ability of 

15 a polypeptide of the invention to bind to or interact 

with a target molecule can be accomplished by determining 
the activity of the target molecule. For example, the 
activity of the target molecule can be determined by 
detecting induction of a cellular second messenger of the 

20 target (e.g., intracellular Ca 2+ , diacylglycerol , IP3, 
etc.), detecting catalytic/enzymatic activity of the 
target on an appropriate substrate, detecting the 
induction of a reporter gene (e.g., a regulatory element 
that is responsive to a polypeptide of the invention 

25 operably linked to a nucleic acid encoding a detectable 
marker, e.g. luciferase) , or detecting a cellular 
response, for example, cellular differentiation, or cell 
proliferation . 

In yet another embodiment, an assay of the present 

30 invention is a. cell-free assay comprising contacting a 
polypeptide of the invention or biologically active 
portion thereof with a test compound and determining the 
ability of the test compound to bind to the polypeptide 
or biologically active portion thereof. Binding of the 

35 test compound to the polypeptide can be determined either 



WO 00/18904 



PCT/US99/22817 



- 98 - 

directly or indirectly as described above. In a 
preferred embodiment, the assay includes contacting the 
polypeptide of the invention or biologically active 
portion thereof with a known compound which binds the 
5 polypeptide to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the 
ability of the test compound to interact with the 
polypeptide, wherein determining the ability of the test 
compound to interact with the polypeptide comprises 

10 determining the ability of the test compound to 

preferentially bind to the polypeptide or biologically 
active portion thereof as compared to the known compound. 

In another embodiment, an assay is a cell -free assay 
comprising contacting a polypeptide of the invention or 

15 biologically active portion thereof with a test compound 
and determining the ability of the test compound to 
modulate (e.g., stimulate or inhibit) the activity of the 
polypeptide or biologically active portion thereof. 
Determining the ability of the test compound to modulate 

20 the activity of the polypeptide can be accomplished, for 
example, by determining the ability of the polypeptide to 
bind to a target molecule by one of the methods described 
above for determining direct binding. In an alternative 
embodiment, determining the ability of the test compound 

25 to modulate the activity of the polypeptide can be 
accomplished by determining the ability of the 
polypeptide of the invention to further modulate the 
target molecule. For example, the catalytic/enzymatic 
activity of the target molecule on an appropriate 

30 substrate can be determined as previously described. 
In yet another embodiment, the cell -free assay 
comprises contacting a polypeptide of the invention or 
biologically active portion thereof with a known compound 
which binds the polypeptide to form an assay mixture, 

35 contacting the assay mixture with a test compound, and 
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determining the ability of the test compound to interact 
with the polypeptide, wherein determining the ability of 
the test compound to interact with the polypeptide 
comprises determining the ability of the polypeptide to 
5 preferentially bind to or modulate the activity of a 
target molecule. 

The cell -free assays of the present invention are 
amenable to use of both a soluble form or the membrane- 
bound form of a polypeptide of the invention. In the 

10 case of cell-free assays comprising the membrane -bound 
form of the polypeptide, it may be desirable to utilize a 
solubilizing agent such that the membrane -bound form of 
the polypeptide is maintained in solution. Examples of 
such solubilizing agents include non-ionic detergents 

15 such as n-octylglucoside, n-dodecylglucoside, n- 

dodecylmaltoside t octanoyl -N-methylglucamide , decanoyl -N- 
methylglucamide, Triton X-100, Triton X-114, Thesit, 
I sot ridecypoly (ethylene glycol ether) n, 3-[(3- 
cholamidopropyl ) dimethylamminio] - 1 -propane sul f onat e 

20 (CHAPS) , 3 - [ (3-cholamidopropyl) dimethylamminio] -2- 

hydr oxy - 1 - propane sulfonate (CHAPSO) , or N-dodecyl=N,N- 
dimethyl- 3 -ammonio-1 -propane sulfonate . 

In more than one embodiment of the above assay methods 
of the present invention, it may be desirable to 

25 immobilize either the polypeptide of the invention or its 
target molecule to facilitate separation of complexed 
from uncomplexed forms of one or both of the proteins, as 
well as to accommodate automation of the assay. Binding 
of a test compound to the polypeptide, or interaction of 

30 the polypeptide with a target molecule in the presence 
and absence of a candidate compound, can be accomplished 
in any vessel suitable for containing the reactants. 
Examples of such vessels include microtitre plates, test 
tubes, and micro- centrifuge tubes. In one embodiment, a 

35 fusion protein can be provided which adds a domain that 
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allows one or both of the proteins to be bound to a 
matrix. For example, glutathione-S-transf erase fusion 
proteins or glutathione-S-transferase fusion proteins can 
be adsorbed onto glutathione sepharose beads (Sigma 
5 Chemical; St. Louis, MO) or glutathione derivatized 

microtitre plates,, which are then combined with the test 
compound or the test compound and either the non- adsorbed 
target protein or A polypeptide of the invention, and the 
mixture incubated under conditions conducive to complex 

10 formation (e.g., at physiological conditions for salt and 
pH) . Following incubation, the beads or microtitre plate 
wells are washed to remove any unbound components and 
complex formation is measured either directly or 
indirectly, for example, as described above. 

15 Alternatively, the complexes can be dissociated from the 
matrix, and the level of binding or activity of the 
polypeptide of the invention can be determined using 
standard techniques. 

Other techniques for immobilizing proteins on matrices 

20 can also be used in the screening assays of the 

invention. For example, either the polypeptide of the 
invention or its target molecule can be immobilized 
utilizing conjugation of biotin and streptavidin. 
Biotinylated polypeptide of the invention or target 

25 molecules can be prepared from biotin-NHS (N-hydroxy- 
succinimide) using techniques well known in the art 
(e.g., biotinylation kit, Pierce Chemicals; Rockford, 
IL) , and immobilized in the wells of streptavidin- coated 
96 well plates (Pierce Chemical) . Alternatively, 

30 antibodies reactive with the polypeptide of the invention 
or target molecules but which do not interfere with 
binding Of the polypeptide of the invention to its target 
molecule can be derivatized to the wells of the plate, 
and unbound target or polypeptidede of the invention 

35 trapped in the wells by antibody conjugation. Methods 
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for detecting such complexes, in addition to those 
described above for the GST- immobilized complexes, 
include immunodetection of complexes using antibodies 
reactive with the polypeptide of the invention or target 
5 molecule, as well as enzyme-linked assays which rely on 
detecting an enzymatic activity associated with the 
polypeptide of the invention or target molecule. 

In another embodiment, modulators of expression of a 
polypeptide of the invention are identified in a method 

10 in which a cell is contacted with a candidate compound 
and the expression of the selected mRNA or protein (i.e., 
the mRNA or protein corresponding to a polypeptide or 
nucleic acid of the invention) in the cell is determined. 
The level of expression of the selected mRNA or protein 

15 in the presence of the candidate compound is compared to 
the level of expression of the selected mRNA or protein 
in the absence of the candidate compound. The candidate 
compound can then be identified as a modulator of 
expression of the polypeptide of the invention based on 

20 this comparison. For example, when expression of the 
selected mRNA or protein is greater (statistically 
significantly greater) in the presence of the candidate 
compound than in its absence, the candidate compound is 
identified as a stimulator of the selected mRNA or 

25 protein expression. Alternatively, when expression of 
the selected mRNA or protein is less (statistically 
significantly less) in the presence of the candidate 
compound than in its absence, the candidate compound is 
identified as an inhibitor of the selected mRNA or 

30 protein expression. The level of the selected mRNA or 
protein expression in the cells can be determined by 
methods described herein. 

In yet another aspect of the invention, a polypeptide 
of the inventions can be used as "bait proteins" in a 

35 two-hybrid assay or three hybrid assay (see, e.g., U.S. 
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Patent No. 5,283,317; Zervos et al. (1993) Cell 72:223- 
232; Madura et al. (1993) J*. Biol. Chem. 268:12046-12054; 
Bartel et al. (1993) Bio/Techniques 14:920-924; Iwabuchi 
et al. (1993) Oncogene 8:1693-1696; and PCT Publication 
5 No. WO 94/10300) , to identify other proteins, which bind 
to or interact with the polypeptide of the invention and 
modulate activity of the polypeptide of the invention. 
Such binding proteins are also likely to be involved in 
the propagation of signals by the polypeptide of the 

10 inventions as, for example, upstream or downstream 

elements of a signaling pathway involving the polypeptide 
of the invention. 

This invention further pertains to novel agents 
identified by the above-described screening assays and 

15 uses thereof for treatments as described herein. 

B. Detection Assays 

Portions or fragments of the cDNA sequences identified 
herein (and the corresponding complete gene sequences) 
can be used in numerous ways as polynucleotide reagents. 

20 For example, these sequences can be used to: (i) map 

their respective genes on a chromosome and, thus, locate 
gene regions associated with genetic disease; (ii) 
identify an individual from a minute biological sample 
(tissue typing) ; and (iii) aid in forensic identification 

25 of a biological sample. These applications are described 
in the subsections below. 

1. Chromo some Mapping 

Once the sequence (or a portion of the sequence) of a 
gene has been isolated, this sequence can be used to map 
30 the location of the gene on a chromosome. Accordingly, 
nucleic acid molecules described herein or fragments 
thereof, can be used to map the location of the 
corresponding genes on a chromosome. The mapping of the 
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sequences to chromosomes is an important first step in 
correlating these sequences with genes associated with 
disease . 

Briefly, genes can be mapped to chromosomes by 
5 preparing PCR primers (preferably 15-25 bp in length) 
from the sequence of a gene of the invention. Computer 
analysis of the sequence of a gene of the invention can 
be used to rapidly select primers that do not span more 
than one exon in the genomic DNA, thus complicating the 

10 amplification process. These primers can then be used 
for PCR screening of somatic cell hybrids containing 
individual human chromosomes. Only those hybrids 
containing the human gene corresponding to the gene 
sequences will yield an amplified fragment. For a review 

15 of this technique, see D'Eustachio et al. ((1983) Science 
220:919-924) . 

PCR mapping of somatic cell hybrids is a rapid 
procedure for assigning a particular sequence to a 
particular chromosome. Three or more sequences can be 

20 assigned per day using a single thermal cycler. Using 
the nucleic acid sequences of the invention to design 
oligonucleotide primers, sublocalization can be achieved 
with panels of fragments from specific chromosomes. 
Other mapping strategies which can similarly be used to 

25 map a gene to its chromosome include in situ 

hybridization (described in Fan et al. (1990) Proc. Natl. 
Acad. Sci. USA 87:6223-27), pre-screening with labeled 
flow- sorted chromosomes, and pre -select ion by 
hybridization to chromosome specific cDNA libraries. 

30 Fluorescence in situ hybridization (FISH) of a DNA 

sequence to a metaphase chromosomal spread can further be 
used to provide a precise chromosomal location in one 
step. For a review of this technique, see Verma et al. # 
(Human Chromosomes: A Manual of Basic Techniques 

35 (Pergamon Press, New York, 1988)). 
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Reagents for chromosome mapping can be used 
individually to mark a single chromosome or a single site 
on that chromosome, or panels of reagents can be used for 
marking multiple sites and/or multiple chromosomes. 
5 Reagents corresponding to noncoding regions of the genes 
actually are preferred for mapping purposes. Coding 
sequences are more likely to be conserved within gene 
families, thus increasing the chance of cross 
hybridizations during chromosomal mapping. 

10 Once a sequence has been mapped to a precise 

chromosomal location, the physical position of the 
sequence on the chromosome can be correlated with genetic 
map data. (Such data are found, for example, in V. 
McKusick, Mendelian Inheritance in Man, available on-line 

15 through Johns Hopkins University Welch Medical Library) . 
The relationship between genes and disease, mapped to the 
same chromosomal region, can then be identified through 
linkage analysis (co- inheritance of physically adjacent 
genes), described in, e.g., Egeland et al. (1987) Mature 

20 325:783-787. 

Moreover, differences in the DNA sequences between 
individuals affected and unaffected with a disease 
associated with a gene of the invention can be 
determined. If a mutation is observed in some or all of 

25 the affected individuals but not in any unaffected 
individuals, then the mutation is likely to be the 
causative agent of the particular disease. Comparison of 
affected and unaffected individuals generally involves 
first looking for structural alterations in the 

30 chromosomes such as deletions or translocations that are 
visible from chromosome spreads or detectable using PCR 
based on that DNA sequence. Ultimately, complete 
sequencing of genes from several individuals can be 
performed to confirm the presence of a mutation and to 

35 distinguish mutations from polymorphisms. 
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2. Tissue Typing 

The nucleic acid sequences of the present invention can 
also be used to identify individuals from minute 
biological samples. The United States military, for 
5 example, is considering the use of restriction fragment 
length polymorphism (RPLP) for identification of its 
personnel. In this technique, an individual's genomic 
DNA is digested with one or more restriction enzymes, and 
probed on a Southern blot to yield unique bands for 
10 identification. This method does not suffer from the 
current limitations of "Dog Tags" which can be lost, 
switched, or stolen, making positive identification 
difficult. The sequences of the present invention are 
useful as additional DNA markers for RFLP (described in 
15 U.S. Patent 5,272,057). 

Furthermore, the sequences of the present invention can 
be used to provide an alternative technique which 
determines the actual base-by-base DNA sequence of 
selected portions of an individual's genome. Thus, the 
20 nucleic acid sequences described herein can be used to 
prepare two PCR primers from the 5' and 3' ends of the 
sequences. These primers can then be used to amplify an 
individual's DNA and subsequently sequence it. 

Panels of corresponding DNA sequences from individuals, 
25 prepared in this manner, can provide unique individual 
identifications, as each individual will have a unique 
set of such DNA sequences due to allelic differences. 
The sequences of the present invention can be used to 
obtain such identification sequences from individuals and 
30 from tissue. The nucleic acid sequences of the invention 
uniquely represent portions of the human genome. Allelic 
variation occurs to some degree in the coding regions of 
these sequences, and to a greater degree in the noncoding 
regions. It is estimated that allelic variation between 
35 individual humans occurs with a frequency of about once 
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per each 500 bases. Each of the sequences described 
herein can, to some degree, be used as a standard against 
which DNA from an individual can be compared for 
identification purposes. Because greater numbers of 
5 polymorphisms occur in the noncoding regions, fewer 
sequences are necessary to differentiate individuals. 
For example, the noncoding sequences of SEQ ID NO:l can 
comfortably provide positive individual identification 
with a panel of perhaps 10 to 1,000 primers which each 

10 yield a noncoding amplified sequence of 100 bases. If 
predicted coding sequences, such as those in SEQ ID NO: 3 
are used, a more appropriate number of primers for 
positive individual identification would be 500-2,000. 
If a panel of reagents from the nucleic acid sequences 

15 described herein is used to generate a unique 

identification database for an individual, those same 
reagents can later be used to identify tissue from that 
individual. Using the unique identification database, 
positive identification of the individual, living or 

20 dead, can be made from extremely small tissue samples. 

3 . Use of Partial Gene Sequences in Forensic Biology 
DNA-based identification techniques can also be used in 
forensic biology. Forensic biology is a scientific field 
employing genetic typing of biological evidence found at 

25 a crime scene as a means for positively identifying, for 
example, a perpetrator of a crime. To make such an 
identification, PCR technology can be used to amplify DNA 
sequences taken from very small biological samples such 
as tissues, e.g., hair or skin, or body fluids, e.g., 

30 blood, saliva, or semen found at a crime scene. The 
amplified sequence can then be compared to a standard, 
thereby allowing identification of the origin of the 
biological sample. 
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The sequences of the present invention can be used to 
provide polynucleotide reagents, e.g., PCR primers, 
targeted to specific loci in the human genome, which can 
enhance the reliability of DNA-based forensic 
5 identifications by, for example, providing another 

"identification marker" (i.e. another DNA sequence that 
is unique to a particular individual) . As mentioned 
above, actual base sequence information can be used for 
identification as an accurate alternative to patterns 

10 formed by restriction enzyme generated fragments. 

Sequences targeted to noncoding regions are particularly 
appropriate for this use as greater numbers of 
polymorphisms occur in the noncoding regions, making it 
easier to differentiate individuals using this technique. 

15 Examples of polynucleotide reagents include the nucleic 
acid sequences of the invention or portions thereof, 
e.g., fragments derived from noncoding regions having a 
length of at least 20 or 30 bases. 

The nucleic acid sequences described herein can further 

20 be used to provide polynucleotide reagents, e.g., labeled 
or labelable probes which can be used in, for example, an 
in situ hybridization technique, to identify a specific 
tissue, e.g., brain tissue. This can be very useful in 
cases where a forensic pathologist is presented with a 

25 tissue of unknown origin. Panels of such probes can be 
used to identify tissue by species and/or by organ type. 

C. Predictive Medicine 

The present invention also pertains to the field of 
predictive medicine in which diagnostic assays, 
30 prognostic assays, pharmacogenomics, and monitoring 
clinical trails are used for prognostic (predictive) 
purposes to thereby treat an individual prophylactically . 
Accordingly, one aspect of the present invention relates 
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to diagnostic assays for determining expression of a 
polypeptide or nucleic acid of the invention and/or 
activity of a polypeptide of the invention, in the 
context of a biological sample (e.g., blood , serum, 
5 cells, tissue) to thereby determine whether an individual 
is afflicted with a disease or disorder, or is at risk of 
developing a disorder, associated with aberrant 
expression or activity of a polypeptide of the invention. 
The invention also provides for prognostic (or 

10 predictive) assays for determining whether an individual 
is at risk of developing a disorder associated with 
aberrant expression or activity of a polypeptide of the 
invention. For example, mutations in a gene of the 
invention can be assayed in a biological sample. Such 

15 assays can be used for prognostic or predictive purpose 
to thereby prophylactically treat an individual prior to 
the onset of a disorder characterised by or associated 
with aberrant expression or activity of a polypeptide of 
the invention. 

20 Another aspect of the invention provides methods for 
expression of a nucleic acid or polypeptide of the 
invention or activity of a polypeptide of the invention 
in an individual to thereby select appropriate 
therapeutic or prophylactic agents for that individual 

25 (referred to herein as "pharmacogenomics' 1 ) . 

Pharmacogenomics allows for the selection of agents 
(e.g., drugs) for therapeutic or prophylactic treatment 
of an individual based on the genotype of the individual 
(e.g., the genotype of the individual examined to 

30 determine the ability of the individual to respond to a 
particular agent) . 

Yet another aspect of the invention pertains to 
monitoring the influence of agents (e.g., drugs or other 
compounds) on the expression or activity of a polypeptide 

35 of the invention in clinical trials. 
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These and other agents are described in further detail 
in the following sections. 



1. Diagnostic Assays 

An exemplary method for detecting the presence or 
5 absence of a polypeptide or nucleic acid of the invention 
in a biological sample involves obtaining a biological 
sample from a test subject and contacting the biological 
sample with a compound or an agent capable of detecting a 
polypeptide or nucleic acid (e.g., mRNA, genomic DNA) of 

10 the invention such that the presence of a polypeptide or 
nucleic acid of the invention is detected in the 
biological sample. A preferred agent for detecting mRNA 
or genomic DNA encoding a polypeptide of the invention is 
a labeled nucleic acid probe capable of hybridizing to 

15 mRNA or genomic DNA encoding a polypeptide of the 

invention. The nucleic acid probe can be, for example, a 
full-length cDNA, such as the nucleic acid of SEQ ID 

NOs:l-22, 34-43, and - or a portion thereof, such 

as an oligonucleotide of at least 15, 30, 50, 100, 250 or 

20 500 nucleotides in length and sufficient to specifically 
hybridize under stringent conditions to a mRNA or genomic 
DNA encoding a polypeptide of the invention. Other 
suit ab le probes for use in the diagnostic assays of the 
invention are described herein. 

25 A preferred agent for detecting A polypeptide of the 
invention is an antibody capable of binding to A 
polypeptide of the invention, preferably an antibody with 
a detectable label. Antibodies can be polyclonal, or 
more preferably, monoclonal. An intact antibody, or a 

30 fragment thereof (e.g., Fab or F(ab') 2 ) can be used. The 
term "labeled", with regard to the probe or antibody, is 
intended to encompass direct labeling of the probe or 
antibody by coupling (i.e., physically linking) a 
detectable substance to the probe or antibody, as well as 
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indirect labeling of the probe or antibody by reactivity 
with another reagent that is directly labeled. Examples 
of indirect labeling include detection of a primary 
antibody using a f luorescently labeled secondary antibody 
5 and end- labeling of a DNA probe with biotin such that it 
can be detected with f luorescently labeled streptavidin. 
The term "biological sample 11 is intended to include 
tissues, cells and biological fluids isolated from a 
subject, as well as tissues, cells and fluids present 

10 within a subject. That is, the detection method of the 
invention cam be used to detect mRNA, protein, or genomic 
DNA in a biological sample in vitro as well as in vivo. 
For example, in vitro techniques for detection of mRNA 
include Northern hybridizations and in situ 

15 hybridizations. In vitro techniques for detection of A 
polypeptide of the invention include enzyme linked 
immunosorbent assays (ELISAs) , Western blots, 
immunoprecipitations and immunofluorescence. In vitro 
techniques for detection of genomic DNA include Southern 

20 hybridizations. Furthermore, in vivo techniques for 
detection of a polypeptide of the invention include 
introducing into a subject a labeled antibody directed 
against the polypeptide. For example, the antibody can 
be labeled with a radioactive marker whose presence and 

25 location in a subject can be detected by standard imaging 
techniques. 

In one embodiment, the biological sample contains 
protein molecules from the test subject. Alternatively, 
the biological sample can contain mRNA molecules from the 
30 test subject or genomic DNA molecules from the test 

subject. A preferred biological sample is a peripheral 
blood leukocyte sample isolated by conventional means 
from a subject. 

In another embodiment, the methods further involve 
35 obtaining a control biological sample from a control 
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subject , contacting the control sample with a compound or 
agent capable of detecting a polypeptide of the invention 
or mRNA or genomic DNA encoding a polypeptide of the 
invention, such that the presence of the polypeptide or 
5 mRNA or genomic DNA encoding the polypeptide is detected 
in the biological sample, and comparing the presence of 
the polypeptide or mRNA or genomic DNA encoding the 
polypeptide in the control sample with the presence of 
the polypeptide or mRNA or genomic DNA encoding the 

10 polypeptide in the test sample. 

The invention also encompasses kits for detecting the 
presence of a polypeptide or nucleic acid of the 
invention in a biological sample (a test sample) . Such 
kits can be used to determine if a subject is suffering 

15 from or is at increased risk of developing a disorder 
associated with aberrant expression of a polypeptide of 
the invention (e.g., an immunological disorder). For 
example, the kit can comprise a labeled compound or agent 
capable of detecting the polypeptide or mRNA encoding the 

20 polypeptide in a biological sample and means for 

determining the amount of the polypeptide or mRNA in the 
sample (e.g., an antibody which binds the polypeptide or 
an oligonucleotide probe which binds to DNA or mRNA 
encoding the polypeptide) . Kits can also include 

25 instruction for observing that the tested subject is 
suffering from or is at risk of developing a disorder 
associated with aberrant expression of the polypeptide if 
the amount of the polypeptide or mRNA encoding the 
polypeptide is above or below a normal level. 

30 For antibody-based kits, the kit can comprise, for 

example: (1) a first antibody (e.g., attached to a solid 
support) which binds to a polypeptide of the invention; 
and, optionally, (2) a second, different antibody which 
binds to either the polypeptide or the first antibody and 

35 is conjugated to a detectable agent. 
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For oligonucleotide -based kits, the kit can comprise, 
for example: (1) an oligonucleotide, e.g., a detectably 
labeled oligonucleotide, which hybridizes to a nucleic 
acid sequence encoding a polypeptide of the invention or 
5 (2) a pair of primers useful for amplifying a nucleic 
acid molecule encoding a polypeptide of the invention. 

The kit can also comprise, e.g., a buffering agent, a 
preservative, or a protein stabilizing agent. The kit 
can also comprise components necessary for detecting the 

10 detectable agent (e.g., an enzyme or a substrate). The 
kit can also contain a control sample or a series of 
control samples which can be assayed and compared to the 
test sample contained. Each component of the kit is 
usually enclosed within an individual container and all 

15 of the various containers are within a single package 
along with instructions for observing whether the tested 
subject is suffering from or is at risk of developing a 
disorder associated with aberrant expression of the 
polypeptide. 

20 2 . Prognostic Assays 

The methods described herein can furthermore be 
utilized as diagnostic or prognostic assays to identify 
subjects having or at risk of developing a disease or 
disorder associated with aberrant expression or activity 

25 of a polypeptide of the invention. For example, the 

assays described herein, such as the preceding diagnostic 
assays or the following assays, can be utilized to 
identify a subject having or at risk of developing a 
disorder associated with aberrant expression or activity 

30 of a polypeptide of the invention. Alternatively, the 
prognostic assays can be utilized to identify a subject 
having or at risk for developing such a disease or 
disorder. Thus, the present invention provides a method 
in which a test sample is obtained from a subject and a 
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polypeptide or nucleic acid (e.g., mRNA, genomic DNA) of 
the invention is detected, wherein the presence of the 
polypeptide or nucleic acid is diagnostic for a subject 
having or at risk of developing a disease or disorder 
5 associated with aberrant expression or activity of the 
polypeptide- As used herein, a "test sample" refers to a 
biological sample obtained from a subject of interest. 
For example, a test sample can be a biological fluid 
(e.g., serum), cell sample, or tissue. 

10 Furthermore, the prognostic assays described herein can 
be used to determine whether a subject can be 
administered an agent (e.g., an agonist, antagonist, 
peptidomimetic, protein, peptide, nucleic acid, small 
molecule, or other drug candidate) to treat a disease or 

15 disorder associated with aberrant expression or activity 
of a polypeptide of the invention. For example, such 
methods can be used to determine whether a subject can be 
effectively treated with a specific agent or class of 
agents (e.g., agents of a type which decrease activity of 

20 the polypeptide) . Thus, the present invention provides 
methods for determining whether a subject can be 
effectively treated with an agent for a disorder 
associated with aberrant expression or activity of a 
polypeptide of the invention in which a test sample is 

25 obtained and the polypeptide or nucleic acid encoding the 
polypeptide is detected (e.g., wherein the presence of 
the polypeptide or nucleic acid is diagnostic for a 
subject that can be administered the agent to treat a 
disorder associated with aberrant expression or activity 

30 of the polypeptide) . 

The methods of the invention can also be used to detect 
genetic lesions or mutations in a gene of the invention, 
thereby determining if a subject with the lesioned gene 
is at risk for a disorder characterized aberrant 

35 expression or activity of a polypeptide of the invention. 
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In preferred embodiments, the methods include detecting, 
in a sample of cells from the subject, the presence or 
absence of a genetic lesion or mutation characterized by 
at least one of an alteration affecting the integrity of 
5 a gene encoding the polypeptide of the invention, or the 
mis -expression of the gene encoding the polypeptide of 
the invention. For example, such genetic lesions or 
mutations can be detected by ascertaining the existence 
of at least one of: 1) a deletion of one or more 

10 nucleotides from the gene; 2) an addition of one or more 
nucleotides to the gene; 3) a substitution of one or more 
nucleotides of the gene; 4) a chromosomal rearrangement 
of the gene; 5) an alteration in the level of a messenger 
RNA transcript of the gene; 6) an aberrant modification 

15 of the gene, such as of the methylation pattern of the 
genomic DNA; 7) the presence of a non-wild type splicing 
pattern of a messenger RNA transcript of the gene; 8) a 
non-wild type level of a the protein encoded by the gene; 
9) an allelic loss of the gene; and 10) an inappropriate 

20 post-translational modification of the protein encoded by 
the gene. As described herein, there are a large number 
of assay techniques known in the art which can be used 
for detecting lesions in a gene. 

In certain embodiments, detection of the lesion 

25 involves the use of a probe /primer in a polymerase chain 
reaction (PCR) (see, e.g., U.S. Patent Nos. 4,683,195 and 
4,683,202), such as anchor PCR or RACE PCR, or, 
alternatively, in a ligation chain reaction (LCR) (see, 
e.g., Landegran et al. (1988) Science 241:1077-1080; and 

30 Nakazawa et al. (1994) Proc. Natl. Acad. Sci. USA 91:360- 
364) , the latter of which can be particularly useful for 
detecting point mutations in a gene (see, e.g., Abravaya 
et al. (1995) Nucleic Acids Res. 23:675-682). This 
method can include the steps of collecting a sample of 

35 cells from a patient, isolating nucleic acid (e.g., 
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genomic, mRNA or both) from the cells of the sample, 
contacting the nucleic acid sample with one or more 
primers which specifically hybridize to the selected gene 
under conditions such that hybridization and 
5 amplification of the gene (if present) occurs, and 
detecting the presence or absence of an amplification 
product, or detecting the size of the amplification 
product and comparing the length to a control sample. It 
is anticipated that PCR and/or LCR may be desirable to 

10 use as a preliminary amplification step in conjunction 
with any of the techniques used for detecting mutations 
described herein. 

Alternative amplification methods include: self 
sustained sequence replication (Guatelli et al. (1990) 

15 Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional 
amplification system (Kwoh, et al. (1989) Proc. Natl. 
Acad. Sci. USA 86:1173-1177) , Q-Beta Replicase (Lizardi 
et al. (1988) Bio/Technology 6:1197), or any other 
nucleic acid amplification method, followed by the 

20 detection of the amplified molecules using techniques 

well known to those of skill in the art. These detection 
schemes are especially useful for the detection of 
nucleic acid molecules if such molecules are present in 
very low numbers. 

25 In an alternative embodiment, mutations in a selected 
gene from a sample cell can be identified by alterations 
in restriction enzyme cleavage patterns. For example, 
sample and control DNA is isolated, amplified 
(optionally) , digested with one or more restriction 

30 endonucleases, and fragment length sizes are determined 
by gel electrophoresis and compared. Differences in 
fragment length sizes between sample and control DNA 
indicates mutations in the sample DNA. Moreover, the use 
of sequence specific ribozymes (see, e.g. , U.S. Patent 

35 No. 5,498,531) can be used to score for the presence of 
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specific mutations by development or loss of a ribozyme 
cleavage site. 

In other embodiments, genetic mutations can be 
identified by hybridizing a sample and control nucleic 
5 acids, e.g., DNA or RNA, to high density arrays 

containing hundreds or thousands of oligonucleotides 
probes (Cronin et al. (1996) Human Mutation 7:244-255; 
Kozal et al. (1996) Nature Medicine 2:753-759). For 
example, genetic mutations can be identified in two- 

10 dimensional arrays containing light -generated DNA probes 
as described in Cronin et al., supra. Briefly, a first 
hybridization array of probes can be used to scan through 
long stretches of DNA in a sample and control to identify 
base changes between the sequences by making linear 

15 arrays of sequential overlapping probes. This step 

allows the identification of point mutations. This step 
is followed by a second hybridization array that allows 
the characterization of specific mutations by using 
smaller, specialized probe arrays complementary to all 

20 variants or mutations detected. Each mutation array is 
composed of parallel probe sets, one complementary to the 
wild-type gene and the other complementary to the mutant 
gene. 

In yet another embodiment, any of a variety of 
25 sequencing reactions known in the art can be used to 

directly sequence the selected gene and detect mutations 
by comparing the sequence of the sample nucleic acids 
with the corresponding wild- type (control) sequence. 
Examples of sequencing reactions include those based on 
30 techniques developed by Maxim and Gilbert ((1977) Proc. 
Natl. Acad. Sci. USA 74:560) or Sanger ((1977) Proc. 
Natl. Acad. Sci. USA 74:5463). It is also contemplated 
that any of a variety of automated sequencing procedures 
can be utilized when performing the diagnostic assays 
35 ((1995) Bio/Techniques 19:448), including sequencing by 
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mass spectrometry (see, e.g., PCT Publication No. WO 
94/16101; Cohen et al . (1996) Adv. Chromatogr. 36:127- 
162; and Griffin et al. (1993) Appl. Biochem. Biotechnol. 
38:147-159) . 

5 Other methods for detecting mutations in a selected 
gene include methods in which protection from cleavage 
agents is used to detect mismatched bases in RNA/RNA or 
RNA/DNA heteroduplexes (Myers et al. (1985) Science 
230:1242). In general, the technique of "mismatch 

10 cleavage" entails providing heteroduplexes formed by 

hybridizing (labeled) RNA or DNA containing the wild-type 
sequence with potentially mutant RNA or DNA obtained from 
a tissue sample. The double- stranded duplexes are 
treated with an agent which cleaves single-stranded 

15 regions of the duplex such as which will exist due to 
basepair mismatches between the control and sample 
strands. RNA/DNA duplexes can be treated with RNase to 
digest mismatched regions, and DNA/DNA hybrids can be 
treated with SI nuclease to digest mismatched regions. 

20 In other embodiments, either DNA/DNA or RNA/DNA duplexes 
can be treated with hydroxylamine or osmium tetroxide and 
with piperidine in order to digest mismatched regions. 
After digestion of the mismatched regions , the resulting 
material is then separated by size on denaturing 

25 polyacryl amide gels to determine the site of mutation. 
See, e.g., Cotton et al. (1988) Proc. Natl. Acad. Sci. 
USA 85:4397; Saleeba et al. (1992) Methods Enzymol. 
217:286-295. In a preferred embodiment, the control DNA 
or RNA can be labeled for detection. 

30 In still another embodiment, the mismatch cleavage 
reaction employs one or more proteins that recognize 
mismatched base pairs in double -stranded DNA (so called 
*DNA mismatch repair" enzymes) in defined systems for 
detecting and mapping point mutations in cDNAs obtained 

35 from samples of cells. For example, the mutY enzyme of 
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E. coli cleaves A at G/A mismatches and the thymidine DNA 
glycosylase from HeLa cells cleaves T at G/T mismatches 
(Hsu et al. (1994) Carcinogenesis 15:1657-1662). 
According to an exemplary embodiment/ a probe based on a 
5 selected sequence, e.g., a wild-type sequence; is 
hybridized to a cDNA or other DNA product from a test 
cell(s) . The duplex is treated with a DNA mismatch 
repair enzyme, and the cleavage products, if any, can be 
detected from electrophoresis protocols or the like. 

10 See, e.g., U.S. Patent No. 5,459,039. 

In other embodiments, alterations in electrophoretic 
mobility will be used to identify mutations in genes. 
For example, single strand conformation polymorphism 
(SSCP) may be used to detect differences in 

15 electrophoretic mobility between mutant and wild type 

nucleic acids (Orita et al. (1989) Proc. Natl. Acad. Sci. 
USA 86:2766; see also Cotton (1993) Mutat. Res. 285:125- 
144; Hayashi (1992) Genet. Anal. Tech. Appl. 9:73-79). 
Single- st randed DNA fragments of sample and control 

20 nucleic acids will be denatured and allowed to renature. 
The secondary structure of single -stranded nucleic acids 
varies according to sequence, and the resulting 
alteration in electrophoretic mobility enables the 
detection of even a single base change. The DNA 

25 fragments may be labeled or detected with labeled probes. 
The sensitivity of the assay may be enhanced by using RNA 
(rather than DNA) , in which the secondary structure is 
more sensitive to a change in sequence. In a preferred 
embodiment, the subject method utilizes heteroduplex 

30 analysis to separate double stranded heteroduplex 
molecules on the basis of changes in electrophoretic 
mobility (Keen et al. (1991) Trends Genet. 7:5). 

In yet another embodiment, the movement of mutant or 
wild-type fragments in polyacrylamide gels containing a 

35 gradient of denaturant is assayed using denaturing 
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gradient gel electrophoresis (DGGE) (Myers et al. (1985) 
Nature 313:495) . When DGGE is used as the method of 
analysis, DNA will be modified to insure that it does not 
completely denature, for example by adding a *GC clamp of 
5 approximately 40 bp of high-melting GC-rich DNA by PCR. 
In a further embodiment, a temperature gradient is used 
in place of a denaturing gradient to identify differences 
in the mobility of control and sample DNA (Rosenbaum and 
Reissner (1987) Biophys. Chem. 265:12753). 

10 Examples of other techniques for detecting point 
mutations include, but are not limited to, selective 
oligonucleotide hybridization, selective amplif ication, 
or selective primer extension. For example, 
oligonucleotide primers may be prepared in which the 

15 known mutation is placed centrally and then hybridized to 
target DNA under conditions which permit hybridization 
only if a perfect match is found (Saiki et al. (1986) 
Nature 324:163); Saiki et al. (1989) Proc. Natl. Acad. 
Sci. USA 86:6230). Such allele specific oligonucleotides 

20 are hybridized to PCR amplified target DNA or a number of 
different mutations when the oligonucleotides are 
attached to the hybridizing membrane and hybridized with 
labeled target DNA. 

Alternatively, allele specific amplif ication technology 

25 which depends on selective PCR amplification may be used 
in conjunction with the instant invention. 
Oligonucleotides used as primers for specific 
amplification may carry the mutation of interest in the 
center of the molecule (so that amplification depends on 

30 differential hybridization) (Gibbs et al. (1989) Nucleic 
Acids Res. 17:2437-2448) or at the extreme 3' end of one 
primer where, under appropriate conditions, mismatch can 
prevent or reduce polymerase extension (Prossner (1993) 
Tibtech 11:238). In addition, it may be desirable to 

35 introduce a novel restriction site in the region of the 
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mutation to create cleavage -based detection (Gasparini et 
al. (1992) Mol. Cell Probes 6:1). It is anticipated that 
in certain embodiments amplification may also be 
performed using Taq ligase for amplification (Barany 
5 (1991) Proc. Natl. Acad. Sci. USA 88:189). In such 
cases, ligation will occur only if there is a perfect 
match at the 3' end of the 5' sequence making it possible 
to detect the presence of a known mutation at a specific 
site by looking for the presence or absence of 

10 amplification. 

The methods described herein may be performed, for 
example, by utilizing pre-packaged diagnostic kits 
comprising at least one probe nucleic acid or antibody 
reagent described herein, which may be conveniently used, 

15 e.g., in clinical settings to diagnose patients 

exhibiting symptoms or family history of a disease or 
illness involving a gene encoding a polypeptide of the 
invention. 

Furthermore, any cell type or tissue, preferably 
20 peripheral blood leukocytes, in which the polypeptide of 
the invention is expressed may be utilized in the 
prognostic assays described herein. 

3 . Pharmacoqenomics 

25 Agents, or modulators which have a stimulatory or 
inhibitory effect on activity or expression of a 
polypeptide of the invention as identified by a screening 
assay described herein can be administered to individuals 
to treat (prophylactically or therapeutically) disorders 

30 associated with aberrant activity of the polypeptide. In 
conjunction with such treatment, the pharmacogenomics 
(i.e., the study of the relationship between an 
individual's genotype and that individual's response to a 
foreign compound or drug) of the individual may be 

35 considered. Differences in metabolism of therapeutics 
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can lead to severe toxicity or therapeutic failure by 
altering the relation between dose and blood 
concentration of the pharmacologically active drug. Thus, 
the pharmacogenomics of the individual permits the 
5 selection of effective agents (e.g., drugs) for 
prophylactic or therapeutic treatments based on a 
consideration of the individual ' s genotype . Such 
pharmacogenomics can further be used to determine 
appropriate dosages and therapeutic regimens. 

10 Accordingly, the activity of a polypeptide of the 
invention, expression of a nucleic acid of the 
invention, or mutation content of a gene of the invention 
in an individual can be determined to thereby select 
appropriate agent (s) for therapeutic or prophylactic 

15 treatment of the individual* 

Pharmacogenomics deals with clinically significant 
hereditary variations in the response to drugs due to 
altered drug disposition and abnormal action in affected 
persons. See, e.g., Linder (1997) Clin. Chem. 43(2):254- 

20 266. In general, two types of pharmacogenetic conditions 
can be differentiated. Genetic conditions transmitted as 
a single factor altering the way drugs act on the body 
are referred to as "altered drug action." Genetic 
conditions transmitted as single factors altering the way 

25 the body acts on drugs are referred to as "altered drug 
metabolism" . These pharmacogenetic conditions can occur 
either as rare defects or as polymorphisms. For example, 
glucose -6 -phosphate dehydrogenase deficiency (G6PD) is a 
common inherited enzymopathy in which the main clinical 

30 complication is haemolysis after ingestion of oxidant 
drugs (anti-malarials, sulfonamides, analgesics, 
nitrofurans) and consumption of fava beans. 

As an illustrative embodiment, the activity of drug 
metabolizing enzymes is a major determinant of both the 

35 intensity and duration of drug action. The discovery of 
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genetic polymorphisms of drug metabolizing enzymes (e.g., 
N-acetyltransf erase 2 (NAT 2) and cytochrome P450 enzymes 
CYP2D6 and CYP2C19) has provided an explanation as to why 
some patients do not obtain the expected drug effects or 
5 show exaggerated drug response and serious toxicity after 
taking the standard and safie dose of a drug. These 
polymorphisms are expressed in two phenotypes in the 
population, the extensive metabolizer (EM) and poor 
metabolizer (PM) . The prevalence of PM is different 

10 among different populations. For example, the gene 
coding for CYP2D6 is highly polymorphic and several 
mutations have been identified in PM, which all lead to 
the absence of functional CYP2D6. Poor metabolizers of 
CYP2D6 and CYP2C19 quite frequently experience 

15 exaggerated drug response and side effects when they 
receive standard doses. If a metabolite is the active 
therapeutic moiety, a PM will show no therapeutic 
response, as demonstrated for the analgesic effect of 
codeine mediated by its CYP2D6- formed metabolite 

20 morphine. The other extreme are the so called ultra- 
rapid metabolizers who do not respond to standard doses. 
Recently, the molecular basis of ultra-rapid metabolism 
has been identified to be due to CYP2D6 gene 
amplification. 

25 Thus, the activity of a polypeptide of the invention, 
expression of a nucleic acid encoding the polypeptide, or 
mutation content of a gene encoding the polypeptide in an 
individual can be determined to thereby select 
appropriate agent (s) for therapeutic or prophylactic 

30 treatment of the individual. In addition, 

pharmacogenetic studies can be used to apply genotyping 
of polymorphic alleles encoding drug-metabolizing enzymes 
to the identification of an individual's drug 
responsiveness phenotype. This knowledge, when applied 

35 to dosing or drug selection, can avoid adverse reactions 
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or therapeutic failure and thus enhance therapeutic or 
prophylactic efficiency when treating a subject with a 
modulator of activity or expression of the polypeptide, 
such as a modulator identified by one of the exemplary 
5 screening assays described herein. 

4 . Monitoring of Effects Purina Clinical Trials 
Monitoring the influence of agents (e.g., c^rugs, 
compounds) on the expression or activity of a polypeptide 
of the invention (e.g., the ability to modulate aberrant 

10 cell proliferation and/or differentiation) can be applied 
not only in basic drug screening, but also in clinical 
trials. For example, the effectiveness of an agent, as 
determined by a screening assay as described herein, to 
increase gene expression, protein levels or protein 

15 activity, can be monitored in clinical trials of subjects 
exhibiting decreased gene expression, protein levels, or 
protein activity. Alternatively, the effectiveness of an 
agent, as determined by a screening assay, to decrease 
gene expression, protein levels or protein activity, can 

20 be monitored in clinical trials of subjects exhibiting 
increased gene expression, protein levels, or protein 
activity. In such clinical trials, expression or 
activity of a polypeptide of the invention and 
preferably, that of other polypeptide that have been 

25 implicated in for example, a cellular proliferation 
disorder, can be used as a marker of the immune 
responsiveness of a particular cell. 

For example, and not by way of limitation, genes, 
including those of the invention, that are modulated in 

30 cells by treatment with an agent (e.g., compound, drug or 
small molecule) which modulates activity or expression of 
a polypeptide of the invention (e.g., as identified in a 
screening assay described herein) can be identified. 
Thus, to study the effect of agents on cellular 
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proliferation disorders, for example, in a clinical 
trial, cells can be isolated and RNA prepared and 
analyzed for the levels of expression of a gene of the 
invention and other genes implicated in the disorder . 
5 The levels of gene expression (i.e., a gene expression 
pattern) can be quantif ied by Northern blot analysis or 
RT-PCR, as described herein, or alternatively by 
measuring the amount of protein produced, by one of the 
methods as described herein, or by measuring the levels 

10 of activity of a gene of the invention or other genes. 
In this way, the gene expression pattern can serve as a 
marker, indicative of the physiological response of the 
cells to the agent. Accordingly, this response state may 
be determined before, and at various points during, 

15 treatment of the individual with the agent. 

In a preferred embodiment, the present invention 
provides a method for monitoring the effectiveness of 
treatment of a subject with an agent (e.g., an agonist, 
antagonist, peptidomimetic, protein, peptide, nucleic 

20 acid, small molecule, or other drug candidate identified 
by the screening assays described herein) comprising the 
steps of (i) obtaining a pre-administration sample from a 
subject prior to administration of the agent; (ii) 
detecting the level of the polypeptide or nucleic acid of 

25 the invention in the preadministration sample; (iii) 
obtaining one or more post-administration samples from 
the subject; (iv) detecting the level the of the 
polypeptide or nucleic acid of the invention in the post- 
administration samples; (v) comparing the level of the 

30 polypeptide or nucleic acid of the invention in the pre- 
administration sample with the level of the polypeptide 
or nucleic acid of the invention in the post- 
administration sample or samples; and (vi) altering the 
administration of the agent to the subject accordingly. 

35 For example, increased administration of the agent may be 
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desirable to increase the expression or activity of the 
polypeptide to higher levels than detected, i.e., to 
increase the effectiveness of the agent. Alternatively, 
decreased administration of the agent may be desirable to 
5 decrease expression or activity of the polypeptide to 
lower levels than detected, i.e., to decrease the 
effectiveness of the agent. 

C. Methods of Treatment 

The present invention provides for both prophylactic 
10 and therapeutic methods of treating a subject at risk of 
(or susceptible to) a disorder or having a disorder 
associated with aberrant expression or activity of a 
polypeptide of the invention. 

1. Prophylactic Method s 

15 In one aspect, the invention provides a method for 
preventing in a subject, a disease or condition 
associated with an aberrant expression or activity of a 
polypeptide of the invention, by administering to the 
subject an agent which modulates expression or at least 

20 one activity of the polypeptide. Subjects at risk for a 
disease which is caused or contributed to by aberrant 
expression or activity of a polypeptide of the invention 
can be identified by, for example, any or a combination 
of diagnostic or prognostic assays as described herein. 

25 Administration of a prophylactic agent can occur prior to 
the manifestation of symptoms characteristic of the 
aberrancy, such that a disease or disorder is prevented 
or, alternatively, delayed in its progression. Depending 
on the type of aberrancy, for example, an agonist or 

30 antagonist agent can be used for treating the subject. 
The appropriate agent can be determined based on 
screening assays described herein. 
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2 . Therapeutic Methods 

Another aspect of the invention pertains to methods of 
modulating expression or activity of a polypeptide of the 
invention for therapeutic purposes. The modulatory 
5 method of the invention involves contacting a cell with 
an agent that modulates one or more of the activities of 
the polypeptide. An agent that modulates activity can be 
an agent as described herein, such as a nucleic acid or a 
protein, a naturally- occurring cognate ligand of the 

10 polypeptide, a peptide, a peptidomimetic, or other small 
molecule. In one embodiment, the agent stimulates one or 
more of the biological activities of the polypeptide. 
Examples of such stimulatory agents include the active 
polypeptide of the invention and a nucleic acid molecule 

15 encoding the polypeptide of the invention that has been 
introduced into the cell. In another embodiment, the 
agent inhibits one or more of the biological activities 
of the polypeptide of the invention. Examples of such 
inhibitory agents include antisense nucleic acid 

20 molecules and antibodies. These modulatory methods can 
be performed in vitro (e.g., by culturing the cell with 
the agent) or, alternatively, in vivo (e.g, by 
administering the agent to a subject) . As such, the 
present invention provides methods of treating an 

25 individual afflicted with a disease or disorder 
characterized by aberrant expression or activity a 
polypeptide of the invention. In one embodiment, the 
method involves administering an agent (e.g., an agent 
identified by a screening assay described herein) , or 

30 combination of agents that modulates (e.g., upregulates 
or downregulates) expression or activity. In another 
embodiment, the method involves administering a 
polypeptide of the invention or a nucleic acid molecule 
of the invention as therapy to compensate for reduced or 

35 aberrant expression or activity of the polypeptide. 
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Stimulation of activity is desirable in situations in 
which activity or expression is abnormally low 
downregulated and/or in which increased activity is 
likely to have a beneficial effect. Conversely, 
5 inhibition of activity is desirable in situations in 
which activity or expression is abnormally high or 
upregulated and/or in which decreased activity is likely 
to have a beneficial effect. 

This invention is further illustrated by the following 
10 examples which should not be construed as limiting. The 
contents of all references, patents and published patent 
applications cited throughout this application are hereby 
incorporated by reference. 

EXAMPLES 

15 TANGO 180, TANGO 181, TANGO 182, TANGO 183, TANGO 184, 
TANGO 185, TANGO 186, TANGO 188, TANGO 189 and TANGO 187, 
were identified in a human prostate epithelial cell 
library. TANGO 215 was identified in a human prostate 
stromal cell library. 

20 TANGO 180, TANGO 181 , TANGO 182, TANGO 183, TANGO 184, 
TANGO 185, TANGO 186, TANGO 188, TANGO 189, TANGO 215, 
and TANGO 187 were identified by first analyzing clones 
present in the two libraries to identify EST sequences 
which potentially encode a signal peptide having at least 

25 15 amino acids. Selected clones which include an EST 
sequence that appeared to encode a signal peptide having 
at least 15 amino acids were used to assemble additional 
EST sequences to form potential full-length gene 
sequences. The assembled full-length gene sequences were 

30 then used to identify actual full-length clones in the 
two libraries. 
Deposit of Clones 

Clones containing cDNA molecules encoding TANGO 180, 
TANGO 181, TANGO 182, TANGO 183, TANGO 184, TANGO 185, 
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TANGO 186, TANGO 188, TANGO 189, TANGO 215 and TANGO 187 
were deposited with the American Type Culture Collection 
(Manassas, VA) as composite deposits. 

Clones encoding TANGO 180, TANGO 181, TANGO 182 and 
5 TANGO 183, and TANGO 184 were deposited on September 25, 
1998 with the American Type Culture Collection under 
accession number ATCC 98901, from which each clone 
comprising a particular cDNA clone is obtainable. This 
deposit is a mixture of five strains, each carrying one 

10 recombinant plasmid harboring a particular cDNA clone. 

To distinguish the strains and isolate a strain harboring 
a particular cDNA clone, one can first streak out an 
aliquot of the mixture to single colonies on nutrient 
medium (e.g., LB plates) supplemented with 100/ig/ml 

15 ampicillin, grow single colonies, and then extract the 
plasmid DNA using a standard minipreparation procedure. 
Next, one can digest a sample of the DNA minipreparation 
with a combination of the restriction enzymes Sal I and 
Not I and resolve the resultant products on a 0.8% 

20 agarose gel using standard DNA electrophoresis 

conditions. The digest will liberate fragments as 
follows : 

TANGO 180 (EpT180) 1.2 kb and 2.7 kb 
TANGO 181 (EpT181) 4.5 kb and 2.7 kb 
25 TANGO 182 (EpT182) two 2.7 kb fragments 
TANGO 183 (EpT183) 1.6 kb and 2.7 kb 
TANGO 184 (EpT184) 4 . 5 kb 

The identity of the strains can be inferred from the 
fragments liberated. 

30 Clones encoding TANGO 185, TANGO 186, TANGO 187, TANGO 
188 and TANGO 189 (splice variant 1) were deposited on 
September 25, 1998 with the American Type Culture 
Collection under accession number ATCC 98900, from which 
each stain comprising a particular cDNA clone is 

35 obtainable. The deposit is a mixture of five strains, 
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each carrying one recombinant plasmid harboring a 
particular cDNA clone. To distinguish the strains and 
isolate a strain harboring a particular cDNA clone, one 
can first streak out an aliquot of the mixture to single 
5 colonies on nutrient medium (e.g., LB plates) 

supplemented with 100/ig/ml ampicillin, grow single 
colonies, and then extract the plasmid DNA using a 
standard minipreparation procedure. Next, one can digest 
a sample of the DNA minipreparation with a combination of 

10 the restriction enzymes Sal I and Not I and resolve the 
resultant products on a 0.8% agarose gel using standard 
DNA electrophoresis conditions. The digest will liberate 
one vector fragment of 2.7 kb common to all strains, and 
one insert -specific fragment as follows: 

15 TANGO 185 (EpT185) 2.1 kb 

TANGO 186 (EpT186) 3 . 7 kb 

TANGO 187 (EpT187) 2 . 6 kb 

TANGO 188 (EpT188) 2 . 0 kb 

TANGO 189 (EpT189svl) 1.3 kb 

20 The identity of the strains can be inferred from the 
fragments liberated. 

A clone encoding TANGO 215 and four other clones were 
deposited on September 25, 1998 with the American Type 
Culture Collection under accession number ATCC 98899, 

25 from which the srrain comprising the TANGO 215 cDNA clone 
is obtainable. To distinguish the strains and isolate a 
strain harboring the TANGO 215 cDNA clone, one can first 
streak out an aliquot of the mixture to single colonies 
on nutrient medium (e.g. f LB plates) supplemented with 

30 100/zg/ml ampicillin, grow single colonies, and then 

extract the plasmid DNA using a standard minipreparation 
procedure. Next, one can digest a sample of the DNA 
minipreparation with a combination of the restriction 
enzymes Sal I and Not I and resolve the resultant 
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products on a 0.8% agarose gel using standard DNA 
electrophoresis conditions. 

The digest will liberate one vector fragment of 2.7 kb 
common to all strains, and one insert -specif ic fragment 
5 as follows: 

TANGO 215 (EpT215) 2.8 kb 

The identity of the strain harboring the TANGO 215 cDNA 
clone can be inferred from the fragments liberated. 

Equivalents 

10 The contents of all references, patents and published 
patent applications cited throughout this application are 
hereby incorporated by reference. Those skilled in the 
art will recognize, or be able to ascertain using no more 
than routine experimentation, many equivalents to the 

15 specific embodiments of the invention described herein. 
Such equivalents are intended to be encompassed by the 
following claims. 

What is claimed is: 
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1. An isolated nucleic acid molecule selected from 
the group consisting of: 

a) a nucleic acid molecule comprising a nucleotide 
sequence which is at least 55% identical to the 

5 nucleotide sequence of any of SEQ ID NOs.:l-22, 34-43, and 

- , the cDNA insert of a plasmid deposited with 

the ATCC as any of Accession Numbers 98899, 98900, and 
98901, or a complement thereof; 

b) a nucleic acid molecule comprising a fragment of 
10 at least 300 nucleotides of the nucleotide sequence of 

any of SEQ ID NOs:l-22, 34-43, and - , the cDNA 

insert of a plasmid deposited with the ATCC as any of 
Accession Numbers 98899, 98900, and 98901, or a 
complement thereof; 

15 c) a nucleic acid molecule which encodes a 

polypeptide comprising the amino acid sequence of any of 

SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 

20 98899, 98900, and 98901; 

d) a nucleic acid molecule which encodes a fragment 
of a polypeptide comprising the amino acid sequence of 

any of SEQ ID NOs: 23-33, 54-63, and - wherein the 

fragment comprises at least 15 contiguous amino acids of 

25 any of SEQ ID NOs: 23 -33, 54-63, and - or the 

polypeptide encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901; and 

e) a nucleic acid molecule which encodes a naturally 
30 occurring allelic variant of a polypeptide comprising the 

amino acid sequence of any of SEQ ID NOs:23-33, 54-63, 

and _ - or an amino acid sequence encoded by the 

cDNA insert of a plasmid deposited with ATCC as any of 
Accession Numbers 98899, 98900, and 98901, wherein the 
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nucleic acid molecule hybridizes to a nucleic acid 
molecule comprising any of SEQ ID 1108:1-22, 34-43, and 

- or a complement thereof under stringent 

conditions. 

5 .2. The isolated nucleic acid molecule of claim 1, 
which is selected from the group consisting of: 

a) a nucleic acid molecule comprising the nucleotide 
sequence of any of SEQ ID NO: 1-22 and 34-43, the cDNA 
insert of a plasmid deposited with the ATCC as any of 

10 Accession Numbers 98899, 98900, and 98901, or a 
complement thereof; and 

b) a nucleic acid molecule which encodes a 
polypeptide comprising the amino acid sequence of any of 
SEQ ID Nos: 23-33, 54-63, and - or an amino acid 

15 sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901. 

3. The nucleic acid molecule of claim 1 further 
comprising vector nucleic acid sequences. 

20 4. The nucleic acid molecule of claim 1 further 

comprising nucleic acid sequences encoding a heterologous 
polypeptide. 

5. A host cell which contains the nucleic acid 
molecule of claim 1. 

25 6. The host cell of claim 5 which is a mammalian host 
cell . 

7. A non-human mammalian host cell containing the 
nucleic acid molecule of 
claim 1. 
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8. An isolated polypeptide selected from the group 
consisting of: 

a) a fragment of a polypeptide comprising the amino 
acid sequence of any of SEQ ID Nos: 23 -33, 54-63, and - 

5 , wherein the fragment comprises at least 15 

contiguous amino acids of any of SEQ ID Nos: 23-33 and 54- 
63, and - ; 

b) a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

10 SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes to a 

15 nucleic acid molecule comprising any of SEQ ID Nos: 1-22, 

34-43, and - or a complement thereof under 

stringent conditions; and 

c) a polypeptide which is encoded by a nucleic acid 
molecule comprising a nucleotide sequence which is at 

20 least 55% identical to a nucleic acid comprising the 

nucleotide sequence of any of SEQ ID Nos: 1-22, 34-43, and 
- or a complement thereof. 

9. The isolated polypeptide of claim 8 comprising the 
amino acid sequence of any of SEQ ID Nos: 23-33, 54-63, 

25 and - or an amino acid sequence encoded by the 

cDNA insert of a plasmid deposited with the ATCC as any 
of Accession Numbers 98899, 98900, and 98901. 

10. The polypeptide of claim 8 further comprising 
heterologous amino acid sequences. 

30 11. An antibody which selectively binds to a 
polypeptide of claim 8. 
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12. A method for producing a polypeptide selected from 
the group consisting of: 

a) a polypeptide comprising the amino acid sequence 
of any of SEQ ID Nos:23-33, 54-63, and - or an 

5 amino acid sequence encoded by the cDNA insert of a 
plasmid deposited with the ATCC as any of Accession 
Numbers 98899, 98900, and 98901; 

b) a polypeptide comprising a fragment of the amino 
acid sequence of any of SEQ ID Nos:23-33, 54-63, and 

10 - or an amino acid sequence encoded by the cDNA 

insert of a plasmid deposited with the ATCC as any of 
Accession Numbers 98899, 98900, and 98901, wherein the 
fragment comprises at least 15 contiguous amino acids of 
any of SEQ ID Nos:23-33, 54-63, and - or an amino 

15 acid sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901; and 

c) a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

20 SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes to a 

25 nucleic acid molecule comprising the nucleotide sequence 

of any of SEQ ID Nos:l-22, 54-63, and - or a 

complement thereof under stringent conditions; 

comprising culturing the host cell of claim 5 under 
conditions in which the nucleic acid molecule is 

30 expressed. 

13. A method for detecting the presence of a 
polypeptide of claim 8 in a sample, comprising: 
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a) contacting the sample with a compound which 
selectively binds to a polypeptide of claim 8; and 

b) determining whether the compound binds to the 
polypeptide in the sample. 

5 14. The method of claim 13, wherein the compound which 
binds to the polypeptide is an antibody. 

15. A kit comprising a compound which selectively 
binds to a polypeptide of claim 8 and instructions for 
use. 

10 16. A method for detecting the presence of a nucleic 
acid molecule of claim 1 in a sample, comprising the 
steps of: 

a) contacting the sample with a nucleic acid probe or 
primer which selectively hybridizes to the nucleic acid 

15 molecule; and 

b) determining whether the nucleic acid probe or 
primer binds to a nucleic acid molecule in the sample. 

17. The method of claim 16, wherein the sample 
comprises mRNA molecules and is contacted with a nucleic 

20 acid probe. 

18. A kit comprising a compound which selectively 
hybridizes to a nucleic acid molecule of claim 1 and 
instructions for use. 

19. A method for identifying a compound which binds to 
25 a polypeptide of claim 8 comprising the steps of: 

a) contacting a polypeptide, or a cell expressing a 
polypeptide of claim 8 with a test compound; and 

b) determining whether the polypeptide binds to the 
test compound . 
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20. The method of claim 19, wherein the binding of the 
test compound to the polypeptide is detected by a method 
selected from the group consisting of: 

a) detection of binding by direct detecting of the 

5 binding of the test compound to the polypeptide binding; 
and 

b) detection of binding using a competition binding 
assay. 

21. A method for modulating the activity of a 
10 polypeptide of claim 8 comprising contacting a 

polypeptide or a cell expressing a polypeptide of claim 8 
with a compound which binds to the polypeptide in a 
sufficient concentration to modulate the activity of the 
polypeptide. 

15 22. A method for identifying a compound which 
modulates the activity of a polypeptide of claim 8, 
comprising: 

a) contacting a polypeptide of claim 8 with a test 
compound; and 

20 b) determining the effect of the test compound on the 
activity of the polypeptide to thereby identify a 
compound which modulates the activity of the polypeptide. 
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ITATGGAGCTGGCTCCTGCCAAGTCCGGGGCCCGCGCCGCTGCCTAGCGCG T CCT GG 79 



M 



A 



L 



4 



GGACTCTGTGGGGACGCGCCCCGCGCCGCGGCTCGCXXXACCCCT ATC GCC CTG CTC 154 

SRPALTLLLLLMAAVVRCQE 24 
TCC CCC CCC GCC CTC ACC CTC CTG CTC CTC CTC ATG GCC GCT GTT CTC AGG TGC CAG GAG 214 

QAQTTDWRATtKTIRNGVHK 44 
CAG GCC CAG ACC ACC GAC TGG AGA GCC ACC CTG AAG ACC ATC CGG AAC GGC GTT CAT AAG 274 

I DTYLNAALD LLGG E.OGLCQ 64 
ATA GAC ACG TAC CTG AAC GCC GCC TTG GAC CTC CTG GGA GGC GAG GAC GGT CTC TGC CAG 334 

VKCS OGSKP F P R Y G Y K P S P P 84 
TAT AAA TGC AGT GAC GGA TCT AAG CCT TTC CCA CGT TAT GCT TAT AAA CCC TCC CCA CCG 394 

NGCGSPLFGVHLNIGI PSLT 104 
AAT GGA TGT GGC TCT CCA CTG TTT GGT GTT CAT CTT AAC ATT GGT ATC CCT TCC CTG ACA 454 

KCCNQHORCYETCGKSKND.C 124 
AAG TGT TGC AAC CAA CAC GAC AGG TCC TAT GAG ACC TGT GGC AAA AGC AAG AAT GAC TCT 514 

DEEFQYCLSKICRDVQKTLG 144 
GAT GAA GAA TTC CAG TAT TCC CTC TCC AAG ATC TGC CCA GAT CTA CAG AAA ACA CTA GGA 574 

L TQH VQACE TTVE L L FDS V I 164 
CTA ACT CAG CAT GTT CAG GCA TCT GAA ACA ACA GTC GAG CTC TTC TTT GAC AGT GTT ATA 634 

H LGCKPYLDS^QRAACRCHYE 184 
CAT TTA GGT TGT AAA CCA TAT CTC GAC AGC OLA CGA GCC GCA TCC AGG TCT CAT TAT GAA 694 

E K T D L * 190 
GAA AAA ACT GAT CTT TAA 712 

ACGAGATGCCGAC^GCTAGTCACAGATCAAGATGGAAGAACATA CCTTTCACAAATAACTAAT G TT IT 1'A CAACATAAA 791 

ACTCTCTTA1 1 1 1 lUi CAAAGCATTAT T 1 1UAGACCTTAAAATAATTTATATCTTGATCTTAAAACCTCAAAGCAAAM 870 

AAGTCAGCCAGATAGTCAGGGCAGGGCACC C ' l TCTCTTC T C AGGTA T C T TCCCCAGCATTCCTCCCTTACTTACTATCC 94 9 

CAAA rG TCTT CA CCAATATCAAAAACAA GTCC T TCT I TA GCGCAGA A rTTTG AAAACACCAATATATAACTCAATTTTC 1028 

ACAACCACATTTACCAAAAAAAGAGATCAAATATAAAATTCATCATAATG^ 1107 

GGGGAAATTATCACTTACAAGTA 1 1 rCTT I'ACTATCAAATTTTAAATACACATTTATCCCTACAAAAAAAAAAAAAAAA 1186 



AAAAAAAGGGCCCCCCC 



1203 
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JCCCAGCCCCGCCGCGCCGGCCCC 79 



MVTPRPAPARGPALLLLL 
GCAG ATG GTG ACT CCG CGG CCC GCC CCC GCC CGG GGC CCC GCG CTC CTC CTC CTC CTG 



18 

137 



LLATARGQEQDQTTDWRATL 38 
CTG CTG GCC ACT GCG CGC GGG CAG GAA CAG GAC CAG ACC ACC GAC TGG AGG GCC ACC CTC 197 

KTIRNCIHKIDTYLNAALDL 58 
AAG ACC ATC CGC AAC GGC ATC CAC AAG ATA GAC ACG TAC CTC AAC GCC GCG CTG GAC CTG 257 

LGGEDGLCQYKCSDGS KPVP 78 
CTG GGC GCG GAG GAC GGG CTC TCC CAG TAC AAG TGC AGC GAC GGA TCG AAG CCT GTT CCA 317 

RYGYKPSPPNGCGS P.LFGVH 98 
CGC TAT GGA TAT AAA CCA TCT CCA CCA AAT GGC TGT GGC TCT CCA CTG TTT GGC GTT CAT 377 

LNIGIPSLTKCCNQHDRCYE118 
CTG AAC ATA GGT ATC CCT TCC CTG ACC AAG TGC TGC AAC CAG CAC GAC AGA TGC TAT GAG 437 

T CGK5KHDCDEEF0YCLSKZ 138 
ACC TGC CGG AAA AGC AAG AAC GAC TCT GAC GAG GAG TTC CAG TAC TGC CTC TCC AAG ATC 497 

CRDVQKTLGLSQNVQACETT 158 
TGC AGA GAC GTG CAG AAG ACG CTC GGA CTA TCT CAG AAC GTC CAG GCA TCT GAG ACA ACG 557 

V.E LLFDSVIHLGCKPYLDSQ 178 
GTG GAG CTC CTC TTT GAC ACC GTC ATC CAT TTA GGC TGC AAG CCA TAC CTG GAC AGC CAG €17 

RAACWCRYEEKTDL* 193 
CCG CCT CCA TCC TCG TGT CGT TAT GAA GAA AAA ACA GAT CTA TAA 652 

AGACCCTCACTGCTCGAGAGCAGCCCACAATGGAGGATCATCCTTC 74 1 

CCTTA G '1 1 1 Tli f GXCGA TGGGTCA i ' rTTGA GACCTTTCTATA erGTCTCTTTTTI l'AGAACCTCAAAGTCAAAACGCTG 820 

CGGGGCCAGGCAGAAAC^VGAGGGAGAGCATGCTTGGGATGGGGACro f rCCTGAGA 899 

CILUCIOILI lv»Gl wCCIXrCCCCAAACTGCGAAGAAAAGCTTAACCrCC^ 1 TOCTCTI CATAGTTGTACTTAAC 978 

AATAAAAATGAAAGCAAATCTAAAATTCATTGTAAGGA C 1 1 ' ["I 'CACCATTATTTTA f T TTU AAATACAGCCCAATCTTC 1057 

CCTTAGAACTATTATTTA 1 X 1 ' 1 U AAATTTCAGATCTACATTTATACCTGGAAAAACTATTAATTCTCCAT T T I fA TTAT 1136 

ACATAATGTCTTGTTTCTCTGAAGCCCACTAAGATACGTATAAA 1215 

A PCTCT " G TACAGTTGGAATCACCCTTGCTACTTCTCTGCAGACA 1294 

uAATTCAGAAGCCCAGCTTCCTCTCTCACAAACCCCITAGAGTGAATCTCCT^ 1373 

OACGCCTTTAACGCGCCAACCCCACCrrCTGAATO%GTGCGCTATCTC 14 52 

TTTTXTCATCTTCTATCCTCGACTAGTCTTAAAAGTCTGACAT^ 1531 

TGGTAAAAAAAAAAAAAAAAAAAAAAAAAAGGGCGCCCG 1570 
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ACCACCCGTCCCCCCACGlTrrC C GGT C GCG T GCTGAGGGGTGTGA CUG I 1 1 111 iLCTCCTCGGCTCGGACGAGTACGG 79 

MAQLGAVVAV 10 
AGCGCCTGCAGGGACAGCCTGGATAAAGGCTCACTG ATG GCT CAG TTG GGA GCA CTT GTG GCT GTG 145 

ASSFFCASL-FSAVHKXEB6K 30 
GCT TCC ACT TTC TTT TGT GCA TCT CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA CAT 205 

IGVYYRGGALLTSTSGPGFH .50 
ATT GGG GTA TAT TAC AGA GGC GGT GCC CTG CTG ACT TCG ACC AGC GGC CCT GGT TTC CAT 265 

LMLPFI TSYKSVQTTLQTDE 70 
CTC ATG CTC CCT TTC ATC ACA TCA TAT AAG TCT GTG CAG ACC ACA CTC CAG ACA GAT GAG 325 

VKNVPCGTSG GVM I YFORZ E 90 
GTG AAG AAT GTA CCT TGT GGG ACT ACT GGT GGT GTG ATG ATC TAC TTT GAC AGA ATT GAA 385 

VV NFLVPNAVYDIVKMYTAD 110 
GTG GTG AAC TTC CTG GTC CCG AAC GCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCT GAC - 445 

YD K A L I P N K I H H E L H Q F C S V 130 
TAT GAC AAC GCC CTC ATC TTC AAC AAG ATC CAC CAC GAA CTG AAC CAG TTC TGC ACT GTG 505 

HTLQEVYIELFDQIDENLKL 150 
CAC ACG CTT CAA GAG GTC TAC ATT GAG CTG TTT GAT CAG ATT GAT GAA AAT CTC AAA CTG 565 

ALOQDLTSMAPGLVIQAVRV 170 
GCT TTG CAA CAG GAC CTG ACC TCC ATG GCC CCT GGG CTG GTC ATT CAA GCT GTG CGG GTA 625 

TKPMIPEAIRRNYELMESEK 190 
ACA AAG CCC AAC ATA CCA GAG GCA ATC CGC AGA AAC TAC GAG TTG ATG GAA ACT GAG AAG 685 

TKLLIAAQKQKVVEKEAETE 210 
ACA AAG CTT CTC ATT GCC GCC CAG AAA CAG AAG GTG CTG GAA AAG GAA CCA GAG ACA GAG 745 

RKKALI EAEKVAQVAEITYG 230 
CGG AAG AAG CCG CTC ATT GAG GCA GAA AAA GTG GCC CAG GTG GCT GAG ATC ACC TAC GGG 805 

QKVMEKETEKKIS E I EDAAF 250 
CAG AAG GTC ATG GAG AAG CAG ACT GAG AAG AAG ATT TCA GAA ATT GAA GAT CCT GCA TTT 865 

LAREKAKADAECYTAMKIAE 270 
CTG GCC CGG GAG AAG CCA AAG CCA GAT GCT GAG TGC TAC ACT GCT ATG AAA ATA CCC GAA 925 

ANKLKLTPEYLQLMKYKAIA 290 
CCC AAT AAG CTC AAG CTA ACC CCT GAA TAT CTG CAG CTG ATG AAG TAC AAG CCC ATT GCT 98S 

S N S K I YFGKDI PNMFMDSAG 310 
TCC AAC AGC AAG ATT TAC TTT CGC AAA GAC ATT CCT AAC ATC TTC ATG GAC TCT CCG CCC 1045 

SVSKQFEGLADKLSFCLEDE 330 
ACT CTG AGC AAG CAC TTT GAG CCG CTA CCT CAC AAG CTA ACC TTT CGC TTA CAA GAT CAA 1105 



PLSTATKEN* 
CCC TTC CAG ACG CCC ACT AAG GAC AAT TCA 



340 
1135 
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AAAAAACTTGATATGACTGCAAATGATACTrAACCAGATCrr^ TCLT C CLT CC CC 1214 

GACTA CXr i T CT C T G AClVI' C TfCCAGTTACTGTGGTGAAAA 1293 

AGGAGGGTGGGGACTGATGATGGGGGGTTITATTTCAGCT 1372 

GGGCTTGACCTTTGACCTCTAC^CACTAATTT^ 14S1 

GAGAAATGTAGAGTGTTACCTCCAACTCATTTGATTTCC^ 1530 

TCCAAGCTAGGAGATGTCTCT G CGTGAGGCTCAGCAACT 1609 

AGAAACAGCTCCAGAGAACA ITIUA CCTTCCTGGCAI T C^^ 1688 

TTGAGCCCTCATAAGGAAGTACTGGTGCTAG GTrr^ 1767 

Glt ^ T U i'lTrCTGACTAC Am ^ 1846 
TGTC ArX ' lXi ITTTTTTTTTT7CT I C TCAAAAATTCI GTTCAT I GGTTCCACTCAGCATCAAGAAGACAGGGACAAACAA 1925 

CTCAAGTGTCITAACAGCIX^TGGAGTGGGATCCTTO 2004 

TGCACCTCGAGATGAAGTGTCTTTCTATTATTCTA^ 2083 

GGAACGATCAGTCAAGAGATGTCCTCGTCTTAATGCCr^ 2162 

ACTCTATTCACTAAGTAGCCTGTGTTTTTAAATC 2241 

CAGAGACAGCTGTGTGGAGCAAATCAGAGTTCATGCCCAAGTC 2320 

GC G ' lVl TCTI ' f f r C ATTACTAGGTCACAACA TITTC«A GTCACCTTC 2399 

AAAGGTTTTTCTTATATCCTGAGATTGACC^ 2478 

GTCGAGGCCACCTATCTCGAGTTAACTTCC^ACCATATTGGTGCCCT 2557 

CTTATCAGCTCACCCrCACCCCCCACXCarACCCCCCCCC^ 2636 

TGAATC G T Cr iTT C TCXX^CAATCCCTGC LTlCll ' ri T CC GC 2715 
ACCTTCGCCCACCTTGCTCACAAGTTTTCCCACCATTG^ 10 1 r A CCTCCCCTTACTTCA 2794 

AACITCCQJTrCTLTG' r rC W GACTCCTGCGALTrCTGGTCCTGCGCA 2873 

GTGATCGATTTTAATGTGCTCCAGAGTCCTTTCA^ 2952 

AATCCCAGCALT1TGGGAGGCCAAGGCACGCGGATCACCTCAGCTTACG 303 I 

ACCCCATCTCTACCAAAAATACAAATATTACCCCGCCATCCTCTCACCCACCTCT 3110 

CCCACGAGAATTCCTTGAACTCCCGACCCAGACCTTCCAGTCAGC 3189 

ACAGCAAGGCTCCCTCTCAAGAAAACAAGGTCATTTCCCAAGACTAGCATACGGACT 3268 
TTCCTCCCA7T7CCGTGCTATTAATCA CL"i G 1 ' TA CAGCAACATGACAATGCCCAGCATCCCACTTCCCGAAAATCTCTA 3347 

CTCCT7CTACTCTGAGCTCTTGTTGCCTAGACCTCAGAA 3426 

TCAGCTTCTCTGAATACCACACTTTCCTCACGTCTTAACTTGAGOT 3505 
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CCCATGAGCACAGAAGACCCTCAGTAGACTCAAO M r C CTG CT 1 T 3584 

AAACAAAAAATATGTTATCCTACACATTAGTGTCAATCCAATGtj'riXj rCTCI i'ATCTGCTAAATAGCAAAATCATGAAA 3663 
ATCAG ClUTriUA TTTGCATAGGCAACTAACCXTCT 3 742 

TCTTAAAACATTTGAATTCTAAACATGTAAAATGTGACAK 3821 
ATAAACAGTTACTTA rxrra ATAG A l Xj 'X 1C CA TTTATCAAAATAAGTAACTGTTTATAAAATTCA G i 1TTTGTAGGGTT 3900 
TTCCAAGGAAAAATCaC tTlXiG ' l ^ G AATOITTCT CA CTCATT 3979 
TAATCA LT m TA AAATATAAGGACCGAATGCAAGGAAACCAAAGTTTATTAATA A ' l 1 ITl ' A TATAACTAAAATAAAAT 4 OS 8 
AGATGTCGACXX^TCTGTGATCATATAAAAACGGAGGGTTACniAA 4137 
AG CIGrmA TAAATGATCATTCA lTC 4216 
GTAAATAGTGAAAGTAAGATGGTCATACTTACTGACTTTATCTAT^ 4295 
GTGGCTACTGTGTCTGTGAATGTAACCACTAL'ri CI l' 1 AAGCTCTATTCAGTAGGCTTCCAGCCAC TGC T T TTTTGTrG 4374 
TTTCTAGCCA CXXT ri ' lTlTlTrf CrrUTl ' lCC I T A TAAAACACGTAATAACCAAAAAAAAAAAAAAAAGGGCGGCCG 4451 
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iCTGATGGCGTCATCGAAGCGACTGGCCCGGAAGGAAGTAGGGTG 79 




rACGGTTCCAi 




'ACGGAGCGCCTGGAGGGACAGCCTGGATACAG 158 



MAQLGAVVAVASSFFCA 
GTTCACTG ATG GCT CAG TTG GGA GCT GTT GTG CCC GTG GCT TCC ACT TTC TTT TGT GCA 



X7 
217 



SLFSAVHKIEEGHIGVYYRG 37 
TCT CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA CAT ATT GGA GTA TAT TAC AGA GCT 277 

GALLTSTSGPGFHLMLPFIT .57 
GGT GCC CTG CTG ACC TCC ACC ACT CGC CCG GCT TTC CAT CTC ATG CTC CCG TTC ATC ACA 337 

SYKSVQTTLQTDEVKNVPCG 77 
TCC TAT AAG TCT GTA CAG ACC ACT CTC CAA ACT GAT GAA GTG AAG AAC GTA CCA TGT GGA 397 

TSGGVMIYFDRIEVVNFLVP 97 
ACC AGT GGT GGT GTG ATG ATC TAC TTT GAC AGA ATT GAA GTG GTG AAC TTC CTG GTC CCA 457 

NAVYDIVKNYTADYDKALIF117 
AAT CCA GTC TAT GAT ATA GTG AAG AAC TAT ACT GCA GAC TAT GAC AAG GCC CTC ATC TTC 517 

N K I H H B L NQ FCSVH TLO EVY 137 
AAC AAG ATC CAT CAT GAG CTT AAC CAG TTC TSC ACC GTT CAT ACT CTT CAG GAA GTC TAT 577 

IELFDQI DENLXLALQQDLT 157 
ATC GAG CTG TTT GAT CAA ATT GAT GAA AAC CTC AAG TTG GCT TTG CAG CAG GAC CTG ACT 637 

SMAPGLV IQAVRVTKPNI PE 177 
TCC ATG GCC CCT CGG CTG GTT ATC CAA GCT GTG CGA GTG ACA AAG CCC AAT ATA CCT GAG 697 

AIRRNYE LMESEKTKLLIAA 197 
GCA ATC CGC AGG AAC TAT GAG CTG ATG GAA AGC GAG AAG ACG AAG CTT CTC ATT GCA GCC 757 

QKQKVVEKEA ETERKKALIE217 
CAG AAG CAC AAC GTG GTG GAA AAG GAG GCA GAA ACA GAG AGG AAC AAG GCC CTC ATT GAG 817 

AEKVAQVAEITYGOKVMEKE237 
CCA GAA AAA GTG GCA CAG CTT GCA GAA ATC ACC TAT GGG CAA AAG GTG ATG GAG AAG GAG 877 
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MNMTQARV 8 
GTCGACCCACGCGTCCGGCGGCTGGGCTTCTTCTCAGAGGAACGAGA ATG AAT ATG ACT CAA GCC CGG GTT 71 

LVAAVVGLVAVLLYASIHKI 28 
CTG GTG GCT GCA GTG GTG GGG TTG G7G GCT GTC CTG CTC TAC GCC TCC ATC CAC AAG ATT 131 

EEGHLAV YYRGGAL LTSPSG 48 
GAG GAG GGC CAT CTG GCT GTG TAC TAC AGG GGA GGA GCT TTA CTA ACT AGC CCC ACT GGA 191 

P G Y H I M L P F I TTFR SVQTTL 68 
CCA GGC TAT CAT ATC ATG TTG CCT TTC ATT ACT ACG TTC AGA TCT GTG CAG ACA ACA CTA 251 

QTDEVKNVPCGTSGGVMIYI 88 
CAA ACT GAT GAA GTT AAA AAT GTG CCT TGT GGA ACA AGT GGT GGG GTC ATG ATC TAT ATT 311 

D R I EVVNMLAPYAV FDIVRN 108 
GAC CGA ATA GAA GTG GTT AAT ATG TTG GCT CCT TAT GCA GTG TTT GAT ATC GTG AGG AAC 371 

YTADYDKTLI F N K I HHELNQ 128 
TAT ACT GCA GAT TAT GAC AAG ACC TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG 431 

FCSAHTLQEVY I EL FDQ IDE 148 
TTC TGC AGT GCC CAC ACA CTT CAG GAA GTT TAC ATT GAA TTG TTT GAT CAA ATA GAT GAA 491 

NLKQALOKDLMLMAPGLTIQ 168 
AAC CTG AAG CAA GCT CTG CAG AAA GAC TTA AAC CTC ATG GCC CCA GGT CTC ACT ATA CAG 551 

AVRVTKPKI PEAIRRNFELM 188 
CCT GTG CGT GTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AGA AAT TTT CAG TTA ATG 611 

EAEKTKLLIAAQK QKVVEKE 208 
GAG GCT GAG AAG ACA AAA CTC CTT ATA GCT GCA CAG AAA CAA AAG GTT GTG GAA AAA GAA 671 

AETERKKAVIEAEKI A Q V A K 228 
GCT GAG ACA GAG AGG AAA AAC GCA GTT ATA GAA GCA GAG AAG ATT CCA CAA GTG GCA AAA 731 

IRFQOKVMEK ETE K R I S E I E 248 
ATT CGG TTT CAG CAG AAA GTC ATG GAA AAA CAA ACT GAA AAG CCC ATT TCT GAA ATC GAA 791 

DAAFLAREKAKADAEY YAAH 268 
GAT GCT CCA TTC CTG GCC CGA GAG AAA GCC AAA GCA GAT GCT GAA TAT TAT GCT GCA CAC 851 

KYATSMKHKLTP BYLELKKY 288 
AAA TAT GCC ACC TCA AAC AAG CAC AAG TTG ACC CCC GAA TAT CTG GAG CTC AAA AAG TAC 911 

QAIASNSKIYFGSNI PMMFV 308 
CAG GCC ATT CCT TCT AAC AGT AAG ATC TAT TTT GGC AGC AAC ATC CCT AAC ATC TTC GTG 971 

DSSCALKYSD t RTCRESSLP 328 
CAC TCC TCA TGT GCT TTG AAA TAT TCA GAT ATT AGG ACT GCA AGA GAA ACC TCA CTC CCC 1031 

SKEALEPSCENV IQNXSSTG 348 
TCT AAG GAG CCT CTT GAA CCC TCT GCA CAG AAC GTC ATC CAA AAC AAA GAG ACC ACA CCT 1091 
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TGCAAGW3GTGGAAATGITCTC^ 1173 



CTCATGAATGAGGAAAGTCnt^TGCTAAGATACTCCCTGCATTCCC^ ' 1410 

GCAGGCCATGCTTGACTAACGTAC CI GOT T X 1 A GCCACACCCACCT C L 1 1 UTA TGTTA CLTl ' l CAGCTCTGGCCAAGAO 14 
TGGGACACGGTTTTAACCACAAATAGC^^ 1568 
GGAA GTTTTTATTTTT AAAACTGGATCTGGOGTATATTCft 1647 
GCTGCCATGGTCACAAGCAC^CTGATGCrCCITAACU 1726 
TAGAAAGCAT CC TT GG TCATCATIGTCTCCTTC C CACC^ 1805 
CACCTCCXCCVGGAGATCAGGArrcXACTGACCrrc 1884 
TAACCTGTGGCATTAGGAGACCTACTTCATGTCGACC L 1 1 mTICCmA CTrTTAAC^^ 1963 
GTAGTTCGGCCTGAu 1 1 i\> UiCAGC 1" rUTTAAGACAAc i v. 1 i\j lACACTATGTTGAAGCTCAACAAAAAAGTCATGG 2042 
. GACCAu a 11- i ACAAATC i 1 1 CAGu i u 1 CAGGCl 1 u 1 CAu 1 U"lt^TGACAo rri^i iiXiTTGlXSCCAAACAL'l TTATTTG 2121 
GGAAAGGAAAGCCCAGATTTGAATGGGTCTTTCCCC^ 2200 
TTTTTCATTTTTGCTCATTTAATTCTATAAATTCTCTTTATAA^ 2279 
TTTTGAATTATAAAAATAAAATCTTTACCTCTCGA^ 2358 
CTCTGT G CTT TC ATTCCTAGACA TGTTTTA TACCT 2437 
GGCCTGAGGAACAGCGAAATtXICCCTGTGAACTCTTO 2516 
CTCTCCCCICTCAGCTClXjACGCrGGCCUTCrrTCGGGGTGiTCLi I TTGGCAAATATACACTGTAATCTTGACTCTAA 2595 
ATTTATATGTTGAAATGCTA CCTTrrrrA AAATAAGAAACTAAAT 2674 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2704 



GATTTACAGAGAACTTACACTTCA' 




kTTGGAGGATAGAG 1252 



ccagctgtctgacacacaaa: 




rATCAAGTATCCTATATGTATTCCTTTCTAAACTGCTA 1331 
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MIYIDRI 7 
GTCXSACCCACGCGTCCGTAAAAATGTGCCTTCTGGAACAAGTGGT^ ATG ATC TAT ATT GAC CGA ATA 72 

EVVNMLAPYAVFD I VRNYTA 27 
GAA GTG GTT AAT ATG TTG GCT CCT TAT GCA GTG TTT GAC ATT GTG AGG AAC TAT ACT GCA 132 

DYDKTLI FNKIHHELNQFCS 47 
GAC TAC GAC AAG ACT TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG TTT TGC ACT 192 

AHTLQEVYI ELFDQ IDENLK 67 
GCC CAC ACA CTT CAA GAA GTT TAC ATA GAA TTG TTT GAT CAA ATA GAT GAA AAC CTG AAG 2S2 

OALOKDLNTMAPGLTIQAVR 87 
CAG GCC CTG CAA AAA GAT TTA AAC ACC ATG GCC CCA GGT CTC ACT ATC CAG GCT GTG CGT 312 

VTKPKI PEAIRRNFELMEAE 107 
GTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AGA AAT TTT GAA TTA ATG GAG GCA GAG 372 

KTKLLZAAQKQKVV E KEAET 127 
AAG ACA AAA CTT CTC ATA GCT GCA CAG AAA CAA AAG GTG GTG GAG AAA GAA GCT GAG ACG 432 

ERKRAVIEAEKIAQVAKIRF 147 
GAG AGG AAA AGG GCT GTT ATA GAA GCA GAG AAG ATT GCA CAA GTA GCA AAA ATT CGA TTT 492 

OQKVMEKETEKRISE I E 0 A A 167 
CAA CAG AAA GTG ATG GAG AAA GAA ACT GAA AAA CGC ATT TCT GAG ATT GAA GAT GCT GCG S52 

FLARE KAKADAEYYAA^HKYA 187 
TTC CTG GCC CGA GAG AAG GCA AAA GCA GAT GCC GAG TAT TAC GCT GCA CAC AAA TAC GCC 612 

TSNKHKLTPEYLELK KYQAI 207 
ACC TCA AAC AAG CAC AAA CTC ACC CCA GAG TAT CTG GAG CTC AAG AAA TAC CAG GCC ATT 672 

ASMSKIYFGSNIPSMFVDSS227 
GCC TCA AAC ACT AAG ATC TAC TTT GGC AGC AAC ATC CCC AGC ATG TTT GTG GAC TCC TCC 732 

C ALKYSDGRTG, REDS LPPEE 247 
TGT GCT CTG AAA TAC TCT GAT GGT AGG ACT GCG AGA GAA GAC TCC CTT CCC CCA GAG GAC 792 

AREPSGESP. IQNKENAG* 265 
GCC CGT GAG CCC TCT GGA GAG ACC CCC ATC CAA AAC AAG GAG AAC GCA GGT TCA 846 

TGCAAGAGCTGCAAATCTTCTCCCATATCAAGATGCGACCCAAGGGCCT 925 

AGATTCACAGAGAATGTGTCCTCTG'l"f G'l'UA'n'C'l'C T IXjIXATAGTCCT GG TTT GC CAGCTGACTACAGGATAGACCCA 1004 

CCrGTCTGCCACTCAAACGGTCTCTGCAGCCACAGTTITATCAACT 1083 

ATGAATGAGCGAAAGTCTCATGCTAACATACTCCCTGCACTGC U62 

AAGCTATTGAATAATGTTTACATTGGTCCCTGAG 1241 

ACCTTCAGAAAACGGTAAGTTAAAGAAGACAACTCTCATCAGAC^ 1320 

GGCATTCCTCCATGTGATTCACAGCCAGACCTCTCGCTTCrcAG 1 3 99 
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TACAAATTCTAC d ' TTL ! Ib 1 1 1 1 1 L 1 AGTCAGC rK^TGGCCTGCAGGGACGCGTACTTTGCC^ 1478 
CTCGAAC^TATTCCC^TCACTACrrTT A ri , U .O l l rAGGAGACTCAGAGATATAGAAAGCAGCTGAAATTTAAGGGAGAT 1557 

AAAGCCTGCACTGCACCAAAGCTACGGGTCCCTC7GTTTCCTCT 1636 

ATGTGTGACTAAACTGCCCCGTTTTAGCCACAGACAACTGCT^ 1715 

GCTTTAACCAGACATAGGAGCAGTGTGCAATTCCTGATTCACT^ 1794 

TG ' 1 ' 1 ril ' A AAACTGGATTTGGGGCACATTCATTCACCCCAACACTTCTATCTA 1873 

GTCACTAACACACTGATTCTCCTTAAAGTAATTCTCGAA 1952 

GTGACT7CC7GGGCAGCCATTGAATTCATTTTCCATGAGAAGATCACAG 2110 

ATCCAGACCTTTTTGCCCATCACATTAACTTTCCTGCAATA 2189 

TGACACCT CTTCTG rA TALTUTUTlXSAACCCAGACAGAAAAGTAATG 2268 

TCACAGCACCTAAAGGGT1 GIXX CAAACATTTTATTAAGAAAGTAAACCCCAGATTTGAATCGG G^ 2347 
TTATAGTATAGAGGCATTTGTAATATGGAGAAAATAATTTTTCTCATTT AATTATAGAAATTACCTTCAAACAGATTTT 2426 
GTG , rrCT , n , CX3CCCTTCAAATACI , GGTCTTACAl 1U1 1 GCTGCAGATAAATGATCATTCTCGTCGGATATCTGGATCAC 2505 

TGAGCTCTGTCCTTTCATTCCTAGAGATCrrTTCTCAOT 2584 

GCATTTCTTACCCGTCATACGCCCCGGTGAGGAGCACGGAAGCGCCATO 2653 

AGCTCCTTATGGAGTGAGCTTCCCTGTGCCCACTCAGTGAACTAAGTCTGACC^ 2742 

AATATACACTGTAATCTTTAAGTCTAAATTTATATCTGA^ 2821 

TATCAAAAAAAAAAAAAAAAAGGGCGGCCG 2851 
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MKLLSLVAVVGCLLV 
AGCAAGCCTGATAAGC ATG AAG CTC TTA TCT TTG GTG CCT GTG GTC GGG TGT TTG CTG GTG 



79 

15 
140 



PPAEANKSSEDIRCKCIC PP 35 
CCC CCA GCT GAA GCC AAC AAG ACT TCT GAA GAT ATC CGG TGC AAA TGC ATC TGT CCA CCT 200 

YRtflSGHXYNQNVSQKDCM.C 55 
TAT AGA AAC ATC ACT GGG CAC ATT TAC AAC CAG AAT GTA TCC CAG AAG GAC TGC AAC TGC 260 

LHVVEPMPVPGHDVEAYCLL 75 
CTG CAC GTG GTG GAG CCC ATG CCA GTG CCT GGC CAT GAC GTG GAG GCC TAC TGC CTG CTG 320 

CECRYEERSTTTIKVIIVIY 95 
TGC GAG TGC AGG TAC GAG GAG CGC AGC ACC ACC ACC ATC AAG GTC ATC ATT GTC ATC TAC 380 

LSVVGALLLYMAFL MLVDPL 115 
CTG TCC GTG GTG GGT GCC CTG TTG CTC TAC ATG GCC TTC CTG ATG CTG GTG GAC CCT CTG 440 

I R KPOA. YTEQLHMEE BNEDA 13S 
ATC CGA AAG CCG GAT GCA TAC ACT GAG CAA CTG CAC AAT GAG GAG GAG AAT GAG GAT GCT 500 

RSMAAAAASLGGPRAMTVLB 155 
CGC TCT ATG CCA GCA GCT GCT GCA TCC CTC GGG CGA CCC CGA GCA AAC ACA GTC CTG GAG 560 

RVEGAQORHKLQVOEQRKTV175 
CCT GTG GAA GGT GCC CAG CAG CGG TGG AAG CTG CAG GTG CAG GAG CAG CCG AAG ACA GTC 620 



FDRHKMLS* 
TTC GAT CGG CAC AAG ATG CTC AGC TAG 



184 
647 



ATQQGCXtSLSlXiriXMTZWM 72 6 

CCCTTCCCl X XSG'rfCCA G U X Tr CC CTTT A AAAGCCTGTGGC A 1 1 1 1 fCC TCCTTCTCCCTAACTTTAGAAATGTTGTAC 805 
TTGCCTATTTTCATTAGGGAAGAGCG A 'roTGCT CT CT TGT C ' t ' l L I f CG GTCTTTCGGCTTGAAGGGAGCG 884 

GCAACGCAGGCCAGAAGGGAATGCAGACATTCGAGCCCCCCTCAGGAGT^ 963 
TTCCCGCCTTCO\GCrCnX^ 1042 
GGGAGGAAAGCATGGCCCAGCATTCAGCATGTGTZ CC X 1 TCTGCAGTGGTTC 1 rTAT^CCACTTCrCTCCCAGCCCCA 1121 
CKXCCTCACCCCXACarCC^CrcCACCCXTCAC^ 1200 

ATGGAGTCCCCATGCATACTCT 1279 
ACAGTCACTGAGCCAGAC 1358 
GGTCGCTTGGAACATCAGACTCCAGGCTCAGCGTXX3ATCTC 1437 
TGAA C TTC G TT C TACCAGTGCATGGAGAGAAAATTTTCTCC 1 V I 'I'G'rC'I'l'AGAGTTGTGTGTAAATCAAGGAAGCCATC 1516 
ATTAAA HOl 1 1 1 A 1 1 ICILA AAAAAAAAAAAAAAAAAAAGGCCCCCCC 1565 
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TAGTCTTGCTGGCTCAGCAA 79 

MKLLCLVAVVGCLLVPP 17 
GCCCGATAAGC ATG AAG CTG CTG TGT TTG GTG GCT GTG GTG GGG TGC TTG CTG GTG CCC CCA 141 

AQANKSSEDIRCKCICPPYR 37 
GCT CAA GCC AAC AAG AGC TCT GAA GAT ATC CGG TGC AAA TGC ATC TGT CCG CCT TAG AGA 201 

NISGHI YNQNVSQKDCNCLH 57 
AAC ATC AGC GGG CAC ATT TAC AAC CAG AAT GTG TCT CAG AAG GAC TGC AAC TGC CTG CAT 261 

VVBPM.PV PGHDVEAY CLLCB 77 
GTG GTG GAG CCC ATG CCA GTG CCT GGC CAC GAT GTG GAA GCC TAC TGC CTG CTC TGC GAG 321 

CRYEERSTTTIKVIIVIYLS 97 
TGT AGG TAC GAG GAG CGT AGC ACC ACA ACC ATC AAG GTC ATT ATT GTC ATC TAC CTG TCT 381 

VVGALLLYMAFLMLVDPLI R 117 
GTG GTG GGG CCC CTC TTA CTC TAC ATG GCC TTC CTG ATG CTC GTG GAC CCG CTC ATC CGG 441 

KPDAYTEQLHNEEENEDART 137 
AAG CCA GAT GCC TAT ACT GAG CAG CTG CAC AAT GAA GAG GAG AAT GAG GAT GCT CGC ACC 501 

MATAAAS IGGPRANTVLERV 157 
ATG GCA ACA GCC GCT CCG TCC ATT GGA GGA CCC CGG GCA AAC ACT GTC CTG GAG CGG GTG 561 

ECA0QRWKLQVQEQRKTVFD177 
CAA GGC GCT CAG CAG CGG TGG AAG CTG CAG GTG CAG GAG CAG CGG AAG ACC CTC TTC GAC 621 

R H K M L S * 184 
CCA CAC AAG ATG CTC AGT TAG 642 

ATGGTTTCCATCATTGCATCAGAGACCITXjGCCATGGCTACC 721 

CCTTCAAATGCCCATGG CO 1 1 T A TCCT T C T C CCTCTCTAGAAATCTACTCGA C T G T T ATAACGACGCAGTGTGATTGGG 800 

TCTCTGTAGGTCTCTGGGGGCTAGAGGGGAGGGGAGGGAAGGCAGAAGGCAACACAGA 879 

TGGGTCGAATTCATCCCTCCTCTCTTCACCATTCCTCCCAGCT 958 

GTCATCAAGAGCTCACTGGGTCCGAGGAAAGTATGATCCACCGCTCAGCCT 1037 

CCAGTTCCTTCAGTGCCAGTACrTTAACTTGGCCTACCCCAGTCTCAGGAACTG 1116 

ATCTCCAGAGTCCACCTOGAACC L 1G 1 1 CCCCTCTCCTCGGCTCCTCGTCCACCAGTGCATCGCAGTGCCCATCCATGC 1 1 9 S 

CGGCATATTCAGCAGCTGTCACCTrACTCCCATCCCAGGAGGCCGTAAGGCCrCCCACCTCT^ 1274 

GCTGACCCATAAAGTTGGACCATATGACACAAGGCCAATGGGGACCCGAGTACC^ 1353 

TTGTCCCTCAATTTCATTGTATCATGCATGGAGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1432 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGC 1510 
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GAATTOGGCACGAGGGGAT 

M A T h W G 
CCGGGAGCCGGTCGCGGGGGCTCCGGGCTGTGGGACCGCTGGGCCCCCAGCG ATG GCG ACC CTG TGG GGA 



6 
149 



GLLRLGSLLSLSCLALSVLL 26 
GGC CTT CTT CGG CTT GGC TCC TTG CTC AGC CTG TCG TGC CTG GCG CTT TCC GTG CTG CTG ' 209 

I.AQLSDAAKNFEDVRCKCIC 46 
CTG GCG CAG CTG TCA GAC GCC GCC AAG AAT TTC GAG GAT GTC AGA TGT AAA TGT ATC TGC 269 

PPYKEMSGHIYNKNISQKDC 66 
CCT CCC TAT AAA GAA AAT TCT GGG CAT ATT TAT AAT AAG AAC ATA TCT CAG AAA GAT TGT 329 

DCLHVVEPMPVRGPDVEAYC 86 
GAT TGC CTT CAT GTC GTG GAG CCC ATG CCT GTG CGG GGG CCT GAT GTA GAA GCA TAC TGT 389 

LRCECKYEERSSVTIKVTII106 
CTA CCC TGT GAA TGC AAA TAT GAA GAA AGA AGC TCT GTC ACA ATC AAG CTT ACC ATT ATA 449 

XYLSILGLLLLYMVYLTLVE126 
ATT TAT CTC TCC ATT TTG GGC CTT CTA CTT CTG TAC ATG GTA TAT CTT ACT CTG GTT GAG 509 

PILKRRLFCHAQLIQSDDDI146 
CCC ATA CTG AAG AGG CCC CTC TTT GGA CAT GCA CAG TTC ATA CAG ACT GAT GAT GAT ATT 569 

GDHQPFAMAHDVLARSRSRA166 
GGG GAT CAC CAG CCT TTT CCA AAT GCA CAC CAT GTG CTA GCC CGC TCC CGC AGT CGA GCC 629 

NVLNKVEYAQQRWKLQVQEQ186 
AAC GTG CTG AAC AAG GTA GAA TAT GCA CAG CAG CGC TGG AAG CTT CAA GTC CAA GAG CAG 689 



RKSVFDRHVVLS • 
CGA AAG TCT GTC TTT CAC CGG CAT GTT CTC CTC AGC TAA 



199 
728 



TTGGGAATTGAATTCAAGCTCACTAGAAAGAAACAGGCAGACAACTGGAAAGAACTG 807 
TTTAATACCT'lXjl TUATTTCACCAACTCTTGCT I rTTTTI l ' T CT 886 

TGTTAACGTAATAATAGAGACATTlTTAAAACCyiCACAGCTCA 965 
TACTAATAAAAATAAATCTCKCTCTAAATTATCTTGAA 1044 
TTITAACTTGACTTTCAAGATAATTTTCAGG li 1 1 1 1 ll^HUl rGTTGTI 1 1 1 K5TTTCT I r CTTTTGCTCC CACAGGGG 1123 
ACGGATCCCTCCCAAGTGGTTAACAA C ' l ' i 1 1 i I C AACTCACTTTACTAAACAAA CT i 1 T(J ' I AAATAGACCTTACCTTCT 1202 
ATTTTCG ACTTTCATTT AT ATTTTC CACTGTAGCC AC CCTCATCAAAGACCTC ACTTACTCATTTGA CTTTTCCACTC A 1281 
CTGTCTTATCTGGCTATCTCCTGTGTCTCCACTTCATCGTAAACGGG 13 g 0 
CACATTTTCTTO\TCTACTCTCATCTCTCATCC.VATCCATCCTAC 1439 
CTAAACATACTCTTCCTGTGTGTCCTCTTACTCATCTTCTAC^ l518 
CAA T AAAG AAATTTT ATTTT AAAAAAAAAAAAAAAAAAAAA CTG CCCCCG C ! 5 6 9 
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GTCGACCCA( 




M A S L W 

*CCCGCG ATG GCG AGC CTA TGG 



5 
73 



CGNLLRLGSGLSMSCLALSV 25 
TGC GGA AAC CTG CTG CGG CTG GGC TCG GGG CTC AGC ATG TCC TGC CTG GCG CTG TOG GTG 132 

LLLAQLTGAAKNFEDVRCK C 45 
CTG CTG CTC GCG CAG CTG ACA GGC GCC GCC AAG AAT TTT GAA GAT GTG AGA TGT AAA TGC 193 

ICPPYKBNPGHIYNKNISQK 65 
ATC TGC CCT CCC TAT AAA GAG AAT CCT GGG CAC ATT TAT AAT AAG AAT ATA TCT CAG AAA 253 

DCDCLHVVEPMPVRGPDVEA 85 
GAT TGT GAT TGC CTT CAT GTC GTG GAG CCC ATG CCT GTA CGG GGA CCT GAT GTA GAA GCA 313 

YCLRCECKYEERSSVTIKVT 105 
TAC TGT CTA CGC TGT GAA TGC AAA TAC GAA GAG AGA AGC TCT GTC ACA ATC AAG GTT ACC 373 

IIIYLSILGLLLLYMVYLTL12S 
ATT ATA ATT TAT CTC TCT ATT TTG GGC CTT CTG CTT CTG TAC ATG GTA TAT CTT ACC* TTA 433 

VEPII>KRRLFGKSQLLQSDD 145 
GTT GAG CCC ATC CTG AAG AGG CGC CTC TTT GGA CAC TCC CAG CTG TTG CAG AGC GAT GAT 493 

DVGDHQ PFANAHDVLARSRS 165 
GAC GTT GGG GAT CAC CAG CCT TTT GCA AAT GCC CAT GAT GTG CTG GCC CGC TCT CGC AGC 553 

RANVLNKVEYAOORWKLQVQ18S 
CGA GCC AAT GTT CTA AAC AAG GTG GAG TAC GCT CAG CAG CGC TGG AAG CTC CAG GTC CAG 613 

EQRKSV FDRHVVLS* 200 
GAG CAG CGA AAG TCT GTC TTC GAC CGA CAC CTT GTC CTC AGC TAA 658 

CTCGGAACTGGAATCAGGTGACTACGAAGAACACGCACACAACTGGGAAGA^ I TTTA ATG 737 

CCA rCTTTGTTn 1 ' A CAAATCCrr CCnjG ATCGAGGAAGACTCCAAACTCGA 816 

GTTAATATATTAATACAGACATTTTTACAGCACACAGTTCCAAGTO^C^ 895 

CTAATAAAAITAAGCT G CCTGTG A GTT ATCIU ' O AAGCCC I C 1 1 rC TI G CCACACAGTTC 974 

TAAl 1 r GGTGTI C AAGATAACTTCCA GG rGTGTTI TTGCTTC r CTTl CTlV l U; 'l tX XSAGAC^UVGC^UGGATCCCCT 1053 

GCGAGTCCTTGAGTAGCTrCTCAAGTGTCTTTTCCAGA 1132 

AATGTCCCAGTGTACCTTOCrTGTCAGCGTCCT^ 1211 

GGTTAGCCTG7GGCTGCATTTCATGACCAGTTGGATCTCAAATC T TTG TT'l 'CA 1290 

TGCACTGTGATGTCTGACGCAACATGTTCTAGAACAGACTGGCCATC 1369 

CAGTGTCTGTCkTTCTTCCrCATCTTCTTCTACT 1448 

CCCAACCCTCCCTGGATGATTGACGTACAAATACTGATCAGC C 1 T 1 1 C I U I' CI ' IU CTGAGAGGCA Cj 1 1 L ' l 1 ' lU AACTGA 1527 



TCTCXWCAGCTTrJAACAACXSACTACACrraVGATTG 1606 
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1 T TCTTCCTA CAT CCTCTT ' 1 G GAATGTAACAATAAAATAATTTACAAAACCCAAAAAAAAAAAAAAGGGCGGCCG 1681 
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GCGGCCTCrrCCCI 1 1 lUlOG CGGCGCCCCCGCTCGCAGC^C^CTCrCTG^ 158 

MIRCGLACE 9 

CCGCTCCGCrCCGCrCCGCTCGGCCCCGCGCCG(X^ ATG ATC CGC TGC GGC CTG GCC TGC GAG 227 

RCRWILPLLLLSAIAFDIIA 29 
CGC TGC CGC TGG ATC CTG CCC CTG CTC CTA CTC AGC GCC ATC GCC TTC GAC ATC ATC GCG 287 

LAGRGHLQSSOHGQT55LWW 49 

CTG GCC GGC CTC GGC TGG TTG CAG TCT AGC GAC CAC GGC CAG ACG TCC TCG CTG TGG TGG 247 

KCSQEGGGSGSYEEG'CQSLM 69 

AAA TGC TCC CAA GAG GGC GGC GGC AGC GGG TCC TAC GAG GAG GGC TGT CAG AGC CTC ATG 407 

EYAW GRAAAAMLFCGFI ILV 89 

GAG TAC GCG TGG GGT AGA GCA GCG GCT GCC ATG CTC TTC TGT GGC TTC ATC ATC CTG GTG 467 

ICFI LSFFALCGPQMLVFLR 109 

ATC TGT TTC ATC CTC TCC TTC TTC GCC CTC TGT GGA CCC CAG ATG CTT GTC TTC CTG AGA 527 

VIGGLLALAAVFQI ISLVI Y 129 

GTG ATT GGA GGT CTC CTT GCC TTG GCT GCT GTG TTC CAG ATC ATC TCC CTG GTA ATT TAC 587 

PVKYTQTFTI#HANPAVTY I Y 149 

CCC GTG AAG TAC ACC CAG ACC TTC ACC CTT CAT GCC AAC CCT GCT GTC ACT TAC ATC TAT 647 

NWAYGFGWAATI ILIGCAFF 169 

AAC TGG GCC TAC GGC TTT GGG TGG GCA GCC ACG ATT ATC CTG ATT CCC TGT GCC TTC TTC 707 

FCCLPNYEDDLLGHAKPRYF 189 

TTC TGC TGC CTC CCC AAC TAC GAA GAT GAC CTT CTG GGC AAT GCC AAG CCC AGG TAC TTC 767 



Y T S A * 
TAC AGA TCT GCC TAA 
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CTTGCCAATGAATGTGGGAGAAAATCCCT G CTGCTCAGATGGACTCCAGAAG IG 'f 1 T CT CCAGGCGACTTTG 861 

AACCCA lTn 1 1 C GCAGTGT T C ATATTATTAAACTACTCAAAAATGCTAAAATAATTTGGGAGAAAATA l"l"l"i"l"l ' A ACT 940 
ACTGTTATA C ITTCA POTT TATCTTTTA TT A' tCT 1 1 ICTG AA GTTGTG rCTTT ICA CTAATTACCTATACTATGCCAAT 1019 
ArTTCCTTATATCTATCCATAACATTTATACTACA 1 TTG lAAGAGAATATGCACGTGAAACTTAACACTTTATAAGGTA 1098 
AAAATCAG U T 1TC CAACATTTAATAATCTCATCAA CTC 1177 
T AAGGAC AAG AGCAAGATAAGGTT AAAAli 11 U I I' AATGACXAAACATTCT AAAAGAAATCCAAAAAAAAACTTrATTTT 1256 
CAAGCC7TCGAACTATTTAACGAAAGCAAAATCATTTCCTAAATC 1335 
GAATCATTCATTTTAGCTAACarrTCATGrTCACTCGATATCT 1414 
TCCCATACTTGCTAAGCCTTTCCTTTAAGTGTGAAATATT^ 1 ITTCTCn 1 1A AA U1 iLl 1 1A TAGGGTTA 1493 

CCGTGTGCGAAAATGCTATATTAATAAATCTGTACi 1 li 1 1 S ItWGl 1 '1 ATATGTTCAGAACCAGAGTAGACTGGATTGAA 1572 
AGATXy^ACTGGGTCTAATTTATCATGACTGATAGATCTGGTTAAGTTGTGTAGTA 1651 
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CACAAAACTCCCACTAAAAOlGCCTCACGAGAATAAATGACVlX*CTr 1 T L rAAATCTCAGGTTTATCTGGGCTCTATCA 1730 

TATAGACAGG C IT L T U ATA UlTXTiC AA 1 TTCTTG GTAAACA 1809 

GATTTTAAATGTCTGATATAAAACATGCCACACGACAATTCGGGG^ 1888 

ATCGGATAGGTCATTATGATTTTTTACCATTTCCACT^ 1967 

TTTTGTAAGTTGTGGAAAAAGCTAATTGTAG 1 i I i CATTATGAAG1 1 1 i CCCAATAAACCAGGGCATTCTAAAAAAAAA 2046 

AAAAAAAAAAAGGGCGGCCGC 
2067 
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GTOGACCCACGCGTCCGGCGCTCTGAGTCACCGGAATCAAGGTGTGGCTGGAC^^ 



79 



ETCTTGGCCGCTGCTCCT 158 

MLRCGLACE 9 
GCCCGGCGTTCCTCCGCTCCGCGCCCGCCCCCACCGACGAC ATG CTG CGC TGC GGC CTG GCC TGC GAG 226 

RCRWILPLLLLSAIAFDIIA 29 
CGC TGC AGG TGG ATC CTG CCC CTG CTG CTG CTC AGC GCC ATC GCC TTC GAC ATC ATC GCG 286 

LAGRGWLQSSN.HI QTS S LWW 49 
CTG CCC GGC CGC GGC TGG CTG CAG TCT AGC AAC CAC ATC CAG ACA TCG TCG CTT TGG TGG 346 

RCFDEGGGSGSYDOGCQSZiM 69 
AGG TGT TTC GAC GAG GGC GGC GGC AGC GGC TCC TAG GAC GAT GGC TGC CAG AGC CTC ATG 406 

EYAWGRAAAATLFCGFIILC 89 
GAG TAC GCA TGG GGA CGA GCA GCT GCA GCC ACG CTT TTC TGT GGC TTT ATC ATC CTG TGC 466 

ICFI L S FFALCGPQMLVFLR 109 
ATC TGC TTC ATT CTC TCG TTC TTC GCC CTG TGT GGA CCC CAG ATG CTT GTT TTC CTG AGA 526 

VIGGLLALAAIFOIISL VIY129 
GTC ATT GGA GGC CTC CTC GCA CTG GCT GCC ATA TTC CAG ATC ATC TCC CTG GTA ATC TAC 586 

PVKYT0TFRLHDNPAVNYIY149 
CCC CTG AAG TAC ACA CAG ACC TTC AGG CTT CAC GAT AAC CCT GCT GTT AAT TAC ATC TAT 646 

NWAYG FGWAATI I L I GCS FF 169 
AAC TGG GCC TAT GGC TTC GCA TGG GCG GCC ACC ATC ATC TTC ATT GGT TGT TCC TTC TTC 706 

F CCL PtfYEDDLLGAAKPR YF 189 
TTC TGC TGC CTC CCC AAC TAC GAG GAT GAC CTT TTG GGG GCC GCC AAG CCC AGG TAC TTC 766 
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TCTGGGAGGAAGAGCCTGAGAAAAGCCTCCTGCAACATGGATCTGAGGAGGAAACT 860 



ACXTTTGGGCAATCTTCATATGATCAGAAATGCrAGAATAAATC 939 
ATGTATGTCGTCTGGAGTTAAAAAGACTTGAATTCTCTTTGCTAA 1018 
CCATTTAAGCTTCATTTGTTAAAGAATATGCCTGTGAAACTTC 1097 
CTGATGCGGCTTCTCTT T I TCCACATACAATGGG 1 rG TT T'CItJCTAAGGGCTACAGAGCAGGAAAGTCACTGGCAAAAC 1176 
TTCCGTGACCAAATATCCTGAAATTACTA H TTT f T AAAAAGACCTTATTTTGAGTTTTCAGTTACATAAAAAAGCAGA 1255 
AGCAGAf TCXSTTTCCTAAGTGACCAT<^TTTCTGAGAATTTIT^ IT T CTAAGCT 1334 

TCCTGTTGACTrTCTCrGATGCCTAGAAAAGTCTTCTAACGTA 1413 
GAA1 1 1 1LCILI 1 1 ICCCCTAGTGTACACGGGTAGGGTGTGGGAAGAAGCCGTGTTAGCACATCTGTAGT 1492 
GTATGCITAGAACCAGCCTAGACCCGATGGGAGGATGGACTAGGCCTAATCCCTC^ 1571 
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AGGTAGGAAGGCACAGGAGGCTCACCACTCTCACACCAGTGCCATGC^ 1550 

TTCTCAGTGCTTCTTCCCTTAACTGAGCTCT X729 

TAATTAAAAOTGGTCTTCCTTGGTAAGCAGACT^^ 1808 

TGTCrCTGAATACATACCGGAAGGGCTACTATTACCTTTTCCT^ 1887 

TTAACTATCAGAACACTATTTTGTAAGGTGCTCCAAAG^ 19 66 

TCTITCAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAA 2030 
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MAGI PGLLFLLF 12 
GGGCT GC TC GG OGCGGAACAGTCCTCGGC ATG GCA GGG ATT CCA GGG CTC CTC TTC CTT CTC TTC 144 

FLLCAVGOVSPYSAPWKPTW 32 
TTT CTG CTC TGT GCT GTT GGG CAA GTG AGC CCT TAC AGT GCC CCC TGG AAA CCC ACT TGG 204 

PAYRLPVVLPQSTLNLAKPD 52 
CCT GCA TAC CGC CTC CCT GTC GTC TTG CCC CAG TCT ACC CTC AAT TTA GCC AAG CCA GAC 264 

FGA EA R L E V S S S CG ?.QCH KG 72 
TTT GGA GCC GAA GCC AAA TTA GAA GTA TCT TCT TCA TGT GGA CCC CAG TGT CAT AAG GGA 324 

TPLPTYEEAKQYLSYETLY A 92 
ACT CCA CTG CCC ACT TAC GAA GAG CCC AAG CAA TAT CTG TCT TAT GAA ACG CTC TAT GCC 384 

NGSRTETQVGI YILSSSGDG 112 
AAT GGC AGC CGC ACA GAG ACG CAG GTG GGC ATC TAC ATC CTC AGC AGT AGT GGA GAT GGG 444 

AQHRDSGSSGKSRRKRQIYG 132 
GCC CAA CAC CGA GAC TCA GGG TCT TCA GGA AAG TCT CGA AGG AAG CGG CAG ATT TAT GGC 504 

YDSRFSIFGKDFLLWYPFST 152 
TAT GAC AGC AGG TTC AGC ATT TTT GGG AAG GAC TTC CTG CTC AAC TAC CCT TTC TCA ACA 564 

SVKLSTGCTGTLVAEKHVLT 172 
TCA GTG AAG TTA TCC ACC GGC TCC ACC GGC ACC CTG GTG GCA GAG AAG CAT GTC CTC ACA 624 

AAHCIHDGKTYVfCGTQKLRV 192 
CCT GCC CAC TGC ATA CAC GAT GGA AAA ACC TAT GTG AAA GGA ACC CAG AAG CTT CGA GTG 684 

GFLKPKPKDGGRGANDSTSA 212 
GGC TTC CTA AAG CCC AAG TTT AAA GAT GGT GGT CGA GGG GCC AAC GAC TCC ACT TCA GCC 744 

MPEQ MKFOWIRVKRTHVPKG 232 
ATG CCC GAG CAG ATG AAA TTT CAG TGG ATC CGG GTC AAA CGC ACC CAT CTC CCC AAG GGT 804 

WIKG MAM DIGMDYDYALLEL 252 
TGC ATC AAG CGC AAT GCC AAT GAC ATC GCC ATC GAT TAT GAT TAT CCC CTC CTG GAA CTC 864 

KKPHKRKFMKIGVSPPAKQL 272 
AAA AAG CCC CAC AAG AGA AAA TTT ATG AAG ATT CGG GTG AGC CCT CCT GCT AAG CAG CTC 924 

PGGRIHFSGYDNDRPGNLVY 292 
CCA GGG GCC ACA ATT CAC TTC TCT CCT TAT GAC AAT GAC CCA CCA CGC AAT TTC GTG TAT 984 

RFCDVKDETYDLLYQQCDAO 312 
CCC TTC TGT GAC GTC AAA CAC GAG ACC TAT CAC TTG CTC TAC CAC CAA TCC CAT GCC CAC 1044 

PCASCSGVYVRMWKRQOOKW 332 
CCA CGG CCC AGC CCG TCT CGG GTC TAT CTC AGC ATC TCC AAG ACA CAC CAG CAC AAG TGG 1104 

ERKI IGI FSGHQW VDMNGS P 352 
GAC CCA AAA ATT ATT GGC ATT TTT TCA GGG CAC CAG TGG GTG CAC ATG AAT GGT TCC CCA 1164 
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QDFNVAVRITPLKYAQICYW 372 
CAG GAT TTC AAC GTG GCT GTC AGA ATC ACT CCT CTC AAA TAT GCC CAG ATT TGC TAT TGG 1224 

I KGNYLDCRE G * 384 
ATT AAA GGA AAC TAC CTG GAT TGT AGG GAG GGG TGA 1260 

CACAGTCTTCCCTCCTG G CAGCAATTAAGGCTCrTCAW^ n iTAXjT CATTGG 13 3 9 

CGTGCAC^UyiXSTGTGTGTGTCT G TG T GTCr A AGGlX^ 1418 

GGCTTTACTATTTGAAAACI W lTltjl'GT A TCATATCATATATCATTT^ 14 97 

ATAAAAAAAATACIX5ATTXX3GGGCAATGAGGAATATTTGAC 1576 

TTATTTCATCrGAACrTGTTTCAAAGATTrATACT 1655 

GTC T GTTTTC I 'X C 1 ' G AGATTCATLTrCG T GC TGG GT ITI ri ' ltjlTlTlVlA ATTCAGTGCCTGATCrTTAATGCTTCCA 1734 

TAAGGCAGTGTTCCCATTTAGGAACTTTGACAGCATTTGTTAGG^ 1813 

GTCTTTGAACAGTAAAATT^TGTGTTCACTATACTGAT^ 1892 

GCI^TlTrACTl'IXrCAAAAATAGTi'TCTX I rCCAAAGGTTGTTGCTCTACTTTCTAGGAAGTC 1 TTGCATATGGCCCTC 1971 

CCAACTTTAAAGTCATACCAGAGTGGCCAAGAGTGTTTATCCC^ 2050 

GGAACTAGCTATTTTrCAGAAGACAATAATCAGGGCTTAATTAGAA 2129 

CCACACTAAAAACAATCATAGCATTTTACCCCTGGATTATAGC^ 2208 

AAATGAATTAAATTCCACyVGAACAATGGAAGCATTGCCTGCCAGATCT 2287 

CCACAGTCCTCCACCCnXIATCAAAAATTATTCrCCATA G ' lT^ 1 1 ITT CAATT 2366 

TGGAAA CTl l rCTCTCT CATTTATAGTGAAAATACTTGGAAGTTACITrAAGAAAACCAGTGTGGCC I 1 i 1 lOX TCTA 2445 

GCTTTAAAAGCCCCC C ' l TT TGCTGG AATGCTCTAGGTTATAGATAAACAATTAGGTATAATAGCAAAAATG 2524 

AAGAATGCAAAATGGATCAGAATCATGCCTTCCAATAAAGGCCTTTACACA rCTT T T ATCAATATGATTATCAAATCAC 2603 

AGCATATACAGAAAAGAC 1 IGGACTTATI GTATGTTTTTA 1 1 'I TATGCCTCTCGCCCTAAGCACT 'I CI TTCTAAATGTA 2682 

TCGGAGAAAAAATCAAATGGACTACAAGCACGT G 1' I TGC 1 0 IXAl'l 1 UCACCCCAGGTAAACCTCCATTGTACCAATTTG 2761 

TAAGGATATTCAGATCGAGCACTOTCACTTAGACATTCTCTC I T ' i TCT G Ct rCTCT ' 1 TCTTG AG C 1 1 1 nU CAA 2840 

GGATAATTCTGATAAGGCACrCAAGAAACCTACAACCACAGTC 2919 

AAGGACATGCAGAGCCGCCAGGAAAATTCTGAGTTCCAGCACAA 11 tTCT I IGG AATCTAACACGAATCTAGCCTGACC 2998 

AAGAAGGGAGGTCT(XATTTCTA'ICTCrGGTATTTGGGG G 1 1 rTCTTTOI 11 1 1CC1 ITA G CTTGGTG AAAAAAACTTC 3077 

ACroAACACCAACACCAGAATGGA nT rTTT A A 3156 

ATTTT77GCAAAG7TAGACAATGGCACAAAGTOV\AATG 3 2 3 S 

GAATGATACACCCATATGCTATATACAGCTTAACTO\CAGAACTGTAAAAGAA 3314 

T CI \ 1 : 1 ACTGATAATAAAACAAAGCATGGTATTAAACTATCATAGAAGTAGACAGAAAAA 3393 
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ATTATTAATATAATTA GTGC T T '1 ACATGTGTTAGTTATACATACTAGAAG^ 3472 
ACATTTCCCAAAGTXr roC TCCT TA AACACTCATGCCITATt ^ 1 1 rCTA CCAAAAGTAAAAAGG GTTC TATTMGTCAG 3551 

AGGAAGATGCCTCTCCATTrTCCCTCTCTT^ 3630 

TGTTGTAAAGCGAOWVGTTGAGGTTCTAAAATCTGC\ 3709 

GGCCC 3714 
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G I PGLFILLVLLCVFMQVSP 22 
GGA ATC CCG GGG CTC TTC ATC CTT CTT GTC CTG CTC TGT GTG TTC ATG CAG GTG AGT CCC 213 

YTVPWKPTWPA YRLPVVLPQ 42 
TAC ACC CTT CCG TGG AAA CCC ACA TGG CCG GCT TAT CGC CTC CCT GTA GTC TTG CCT CAG 273 

STLNLAKADFDAKAKLEVSS ' 62 
TCT ACC CTC AAC TTA GCT AAG GCA GAC TTC GAC GCC AAA GCG AAA TTG GAG GTG TCC TCC 333 

SCGPQCHKGTPLPTYEEAKQ 62 
TCA TGT GGA CCT CAG TGT CAC AAG GGA ACA CCA CTC CCC ACC TAC GAA GAG GCC AAG CAG 393 

YLSYETLYANGSRTETRVGI 102 
TAC CTT TCC TAT GAA ACC CTT TAT GCC AAT GGC AGC CGC ACA GAG ACT CGG GTG GCC ATC 453 

Y ILSMGEGRARGRDSEATGR 122 
TAC ATC CTC AGC AAT GGT GAA GCC AGG GCA GGA GGC AGA GAC TCG GAG GCC ACA GGG AGA 513 

SRRK RQI YGYDGRFSI FGKD 142 
TCT CGC AGG AAG AGG CAG ATT TAT GGC TAC GAT GGC AGG TTT AGC ATT TTT GGG AAG GAC- 573 

FLLNYPFSTSVKLST GCTGT 162 
TTC CTG CTC AAT TAT CCT TTC TCA ACA TCG GTG AAG TTG TCT ACT CGC TCC ACT GGC ACC 633 

LVAEKHVLTAAHCIHDGKTY 182 
CTC GTG GCA GAG AAG CAC GTC CTC ACT GCT GCC CAC TGC ATA CAC GAT CGG AAA ACC TAT 693 

VKGTQKLRVGFLKPKYKDGA 202 
GTG AAA GGG ACA CAG AAA CTC CGA CTG GGC TTC CTG AAG CCC AAG TAT AAA CAT GGT GCC 753 

EGONSSSSAMPOKMKFQWIR 222 
GAA CGG GAC AAC ACC TCG AGC TCA GCC ATG CCA GAC AAG ATG AAG TTT CAG TGG ATC CGC 813 

VKRTHVPKGW I K G N A N D IGM 242 
CTG AAA CGC ACC CAT GTG CCC AAG CGG TCC ATC AAG GGC AAT GCC AAT GAC ATC GCC ATC 873 

DYDYALLBLKKPHKROFMKI 262 
GAT TAT GAC TAC CCC CTG CTG GAA CTC AAG AAA CCC CAC AAA AGA CAG TTC ATG AAG ATT 933 

CVSPP AKQL PGGRIHFSGYD 282 
GGT GTG AGT CCT CCA GCG AAG CAG CTC CCA GGG GGC AGG ATC CAC TTC TCT GGT TAT CAC 993 

MDRPGNLVYRFCDVKDETYD 302 
AAT CAC CCG CCC GGC AAT TTG GTC TAC CGC TTC TCT CAT CTC AAA GAT GAG ACC TAC GAC 1053 

LLYOQCDAQPGASGSGVYVR 322 
CTT CTC TAC CAC CAG TGT CAC CCC CAG CCC CGG GCC AGT GCT TCA CGC GTC TAT GTG AGG 1113 

MWKRPQQKWERKIIGIFSGH 342 
ATG TGG AAG ACA CCA CAG CAG AAA TCG GAA AGA AAA ATT ATC GCC ATC TTT TCA CGG CAC 1173 

OWVDMNGSPQD FNVAVRITP 362 
CAG TCG GTG CAC ATG AAT CCC TCT CCA CAG GAT TTC AAC GTG CCA GTT ACA ATC ACG CCT 1233 
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LKYAOICYWIKGNYLDCREG 382 
CTT AAA TAT GCC CAG ATT TGC TAT TGG ATT AAA GGA AAC TAC CTA GAT TGC AGG GAG GGG 1293 

* 383 

TGA 1296 

CATGCCT lTlVrUS CCAGCACC^ 1375 

GTGTGAGTCACATACTAT L 1 1 ITA CCTAGTATTCTTCAAATGGCAAAAATTATTCG 14 54 

GTGCGTTATAGCATTTAAGCAGTCTGAAAGCATA C ' r T I T G CATA 1533 

GACAAGGAAGTTAAACTTTCA <j1T1T11» GAGAATT(^ 1612 

ATAOSTGACaCACAaSGAATATGAATTCTTA X i GT 1691 

TrTTTAATtil VRXJTl'ATTATG CTTCCAGATAATGAT AGCAAAGTCTTCAATAGGCAATT r ATAA IXjlTXTCGATTCAAA 1770 

CATTTACGTAGTAGTCCTTGAAGAGAACAATAATTTATTGGCTA 1849 

ACAGAATTCCCACG CTGCTTTTAGTTTTGA AAATAAAACTTTC CC TT CT 1928 

ACAGAATTCCCACG CTGCTTTTACTTTTGA AAATAAAA C TTTC CC TT CT 1928 
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MAPASRLLALWALA 14 
GTCGACCCACGCGTCCGGGCTC ATC GCG CCG GCG TCG CGG TTG CTC GCG CTC TGG GCG CTG GCG 64 

AVAL PGSCAEGDGGWRPGGP 34 
GCT GTG GCT CTA CCC GGC TCC GGG GCG GAG GGC GAC GCC GGG TGG CGC CCG GGC GGG CCG 124 

GAVAEEERCTVERRADLTYA 54 
GGG GCC GTG GCG GAG GAG GAG CGC TCC ACG GTG GAG CGT CGG GCC GAC CTC ACC TAC GCG 184 

EFVQQYAFVRPVILQGLTDN 74 
GAG TTC GTG CAG CAG TAC GCC TTC GTC AGG CCC GTC ATC CTG CAG GGA CTC ACG GAC AAC 244 

SRFRALCSRDRLLASFGDRV 94 
TCG AGG TTC CGG GCC CTG TGC TCC CGC GAC AGG TTG CTG GCT TCG TTT GGG GAC AGA GTG 304 

VRLSTANTYSYHKVDLPFQE 114 
GTC CGG CTG AGC ACC GCC AAC ACC TAC TCC TAC CAC AAA GTG GAC TTG CCC TTC CAG GAG 364 

YVEQL LHPQDPTS LGNDTLY 134 
TAT GTG GAG CAG CTG CTG CAC CCC CAG GAC CCC ACC TCC CTG GGC AAT GAC ACC CTG TAC 424 

FFGDNNFTEWASLFRHYSPP 154 
TTC TTC GGG GAC AAC AAC TTC ACC GAG TGG GCC TCT CTC TTT CGG CAC TAC TCC CCA CCC 484 

PFGLLGTAPAYS FGIAGAGS 174 
CCA TTT GGC CTG CTG GGA ACC GCT CCA GCT TAC AGC TTT GGA ATC CCA GGA GCT GGC TCG 544 

CVP FHW HG PGYS EV I Y GRKR 194 
GGG GTG CCC TTC CAC TGG CAT GGA CCC GGG TAC TCA GAA GTG ATC TAC GGT CGT AAG CGC 604 

WFLYPPEKTPEFHPNKTTCA 214 
TGG TTC CTT TAC CCA CCT GAG AAG ACG CCA GAG TTC CAC CCC AAC AAG ACC ACG CTG GCC 664 

WLRDTYPALPPSARPLECTI 234 
TGG CTC CGG CAC ACA TAC CCA GCC CTG CCA CCG TCT GCA CCG CCC CTG GAG TGT ACC ATC 724 

RAGEVLYFPDRWWHATLNLD 254 
CGG GCT GGT GAG GTG CTG TAC TTC CCC CAC CCC TCG TCC CAT CCT ACC CTC AAC CTT GAC 784 

TSVFISTFLG* 265 
ACC ACC GTC TTC ATC TCC ACC TTC CTC GCC TAG 817 

CCAAAACAGCTTCCACWACTCCCGCTC^CAC 896 

GCGGCAATCGCCTCAGCCCAGCCCACCCTCACCTC C ' r 975 

GATCCTCAGAGGGGAAACACTCCAGAGTCCAACACCAGAACTTG 1054 

TGTATACXXXJCCCCGCXJCTTCra 1133 

CACCCAGCCATTCTCACAC^TGAATGCGTCAATAACCTCCT^ 1212 

CGGCTCCGGGTCACGGGGTCAAAATGACCCACACGCTGCAGTGACAAGAA 1291 

CATCCCALTCWCCCTGCTCCCCCACC^ 1370 

CAL^^CGTCATCCACCCLT'CCrc^ 14 49 
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TGGCACAGGGGCACACCGTCAGAGGCT^^ 1528 

ACTCACCrTCCTCTTCTCATCra 1607 

GCTGTGCTTGGGGGAGACACCCCACCTCCCTCCTCCATGGG 1686 

CGTGGCTGTCCTCCTCATCACCCTCCTGGTTTCGCTG 1844 

AGATTCACCTGGCCAGATGTCT^ 1923 
GTAAAGCCTTCCATAAACAAAAAAAAAAAAAAAAAAAAGGGCGGCCG 1970 
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MAAAGRRGLLLLFV 14 
GTCGACCCACGCGTCCCGTTC ATG GCG GCG GCT GGG CGG CCC GGT CTG CTT TTG CTC TTT GTA 63 

LWMMVTVILPASGBGGWKQN 34 
CTA TGG ATG ATG GTG ACT GTG ATT CTG CCT GCC TCT GGC GAA GGG GGA TGG AAA CAG AAT 123 

GLG IAAAVMBBERCTVER RA 54 
GGG CTG GGA ATT GCA GCA GCA GTA ATG GAG GAG GAG CGT TGC ACA GTG GAG CGT CGG GGA 183 

HITYSEFMQHYAFLKPVILQ 74 
CAC ATC ACG TAC TCC GAA TTC ATG CAG CAC TAT GCC TTC CTC AAG CCC GTC ATC TTG CAA 243 

GLTDNS KFR ALCS RENLLAS 94 
GGA CTC ACG GAC AAC TCG AAG TTC CGG GCC CTG TGT TCC CGG GAA AAC CTG CTA GCC TCG 303 

FGDNIVRLSTANTYSYQKVD114 
TTC GGG GAC AAC ATT GTT CGC TTG ACT ACA GCC AAC ACC TAC TCC TAC CAG AAA GTG GAC 363 

LPFQEYVEQ LLQPQDPASLG134 
CTG CCC TTC CAG GAA TAT GTG GAA CAG CTG CTG CAG CCC CAG GAT CCT GCA TCC CTA GGC 423 

NDTLYFFGDNNFTBWASLFQ154 
AAT GAC ACC CTG TAC TTT TTT GGA GAC AAC AAC TTC ACT GAG TGC GCA TCC CTC TTC CAG 483 

HYSPPPFRLLGTTPAYSFGI 174 
CAC TAC TCT CCG CCA CCA TTC CGT CTC CTG GGA ACC ACC CCT GCT TAC AGC TTT GGA ATT 543 

AGAGSGVPFHWHGPGFSEVI 194 
GCA GGA GCT GGA TCT GGG GTA CCC TTC CAC TGG CAT GGG CCT GGT TTC TCA GAG GTT ATC 603 

YGRKRWFLYPPEKTPEFHPN 214 
TAT GGT CGG AAG CGC TGG TTC CTC TAC CCT CCT GAG AAG ACA CCT GAG TTC CAC CCT AAC 663 

KTTLAWLLEIYP SLALSARP234 
AAG ACC ACA TTC GCC TGG CTG CTG GAA ATA TAC CCA TCT CTA CCC CTG TCA GCA CGG CCT 723 

LBCTIQAGBVLYFPDRWWHA254 
CTA GAA TGT ACC ATC CAG GCT CGT GAA GTA CTG TAT TTT CCT GAT CGG TCG TGG CAT GCC 783 

TLNLDTSVF I S T F I# G • 270 
ACA CTC AAT CTG GAC ACC ACT CTC TTC ATT TCT ACC TTC CTT GGC TAG 831 

CCAGACAGGCAACTGGCAAGCCCACTGCACCAGCACATGCOIATCT 910 

CCAGCAGCAACCTCACCCCACCCTCACCCACT^ 989 

TATCCTCAGAAGGGGAGCAGTTCAGAACCCATCAGCAGGCCCGATGGGG 1068 

CTOTACCTTCCCrrCTCCAGATCCTCCTGCGCCAC^ 1147 

TTCTCAGAGATGAAAGCGTCAATCAC1T C CTT C ATGCCCAA 1226 

TCACAGCGTG\AAGTCGCCC^C^CCCrc(^C 1305 

CCCTCTCCATCKCCCGGTCTCCATCCGCCCTCCTC^ 1384 

ACTCGCTTrTAATGCACXXnTCCCCCATC 1463 
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AGCACAAGGGGAAAATGTCTAGAACTGGACGGGGCTGTGGGGGTCAC 1S42 

XCCTCACCTTTCTTTTCTCGTCCACCTGAGAGAAGAGCTCATC 1621 

AGATCCACCAAAGGCTGGGGCACTTTTCATGCCAC^ 1779 

CTCA CGTUC T T GG CCTCAATGCAGGCCTGCTGGGCCCGGATGTGGCCATCATCTTCATG^ 18 58 

ACTCCTCCAGTTCCCTGAGGGTTAACCAGAACCTAGTTGGTGATCGCCCT 1937 

TCAGGCCTCTTTCCTCCTGGCXrTTCCC 2016 

CTGCTGGGCXSAGGGACCCACCTCTCTC^^ 2095 

GCGGCCG 
2102 
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CACGCGTCCGGCTGGCGGACCAGGJVGGATGCGCGACCACTCTGAATGCCAGA ATG GAT AAC CGT TTT GCT 



6 
70 



TAFVIACVLSLISTIYMAAS 26 
ACA GCA TTT GTA ATT GCT TGT GTG CTT AGC CTC ATT TCC ACC ATC TAC ATG GCA GCC TCC 130 

IG TDFWYEYRSPVQENSSDL 46 
ATT GGC ACA GAC TTC TGG TAT GAA TAT CGA ACT CCA GTT CAA GAA AAT TCC AGT GAT TTG 190 

NKSIWDEFISDEADEKTYND 66 
AAT AAA AGC ATC TGG GAT GAA TTC ATT AGT GAT GAG GCA GAT. GAA AAG ACT TAT AAT GAT 250 

ALFRYNGT.VGLWRRCITIPK 86 
GCA CTT TTT CGA TAC AAT GGC ACA GTG GGA TTG TGG AGA CGG TGT ATC ACC ATA CCC AAA 310 

NMHWYSPPERTES FDVVTKC 106 
AAC ATG CAT TGG TAT AGC CCA CCA GAA AGG ACA GAG TCA TTT GAT GTG GTC ACA AAA TGT 370 

VSFTLTEQFMEKFVDPGNHN 126 
GTG AGT TTC ACA CTA ACT GAG CAG TTC ATG GAG AAA TTT GTT GAT CCC GGA AAC CAC AAT 430 

SGIDLLRTYLWRC QFLLPFV146 
AGC GGG ATT GAT CTC CTT AGG ACC TAT CTT TGG CGT TGC CAG TTC CTT TTA CCT TTT GTG 490 

S LG LMC FGA L I G LCACI CR S.166 
AGT TTA GCT TTG ATG TGC TTT GGC GCT TTG ATC GGA CTT TGT CCT TGC ATT TGC CGA AGC 550 

LYPTIATGI LHLLACNYSDS 186 
TTA TAT CCC ACC ATT GCC ACG GGC ATT CTC CAT CTC CTT GCA GGA AAT TAC TCA GAT TCT 610 

W L H E * 191 
TGG CTC CAT GAA TAA 625 

TTTTAATGATCTTCTACATTATCCTTGATAATTACTCA 1 11 C 1 CAATAATCTTTTAATTTCATCCCATCACTCTGAGGA 704 

TAGCTTCCAAGCTCTTTAAATGGCCTTACAAACTCATTG 783 

CCAGTGGGCCATCCCTATGGTAGTTTAAAAACATCGCCTTAAAATCCrTQ 862 

CrTGAATCTAGGCTGGCTTGTGATGGTTTTGACQ 941 

ATCATOTGTCCTTAAACCAGTTCTCTrCGAACAC^ 1020 

AAGTCavC^CCACATCCACG'rCTCCrCTGTAGATGCTCCAG 1099 

TCATTTCCACCCATGTGTGGGAGCCATCCTGGATGTCCACCCTTAACAAGCCT^ 1178 

TCTTACTACATCCrrGTGAGACTCTAATAAAGAACC^CTA^ 1257 

ATGAATT G ? TCTI TTGTCCCCCTAAAAAAAAAAAAAAAAAAAAAAAAAAAA X308 
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AATTCGGMWCMKKKGVVCXJWGCCGGTGGAGTX3AGAGGATGGG ATG GAT AAC CGT 75 

FATAFVIACVLSL I STIYMA 24 
TTT GCT ACT GCG TTT GTG ATT GCT TGT GTG CTT ACT CTG ATT TCC ACC ATC TAC ATG GCG 135 

AS IGTDFWYEYRS P IQENSS 44 
GCC TCC ATA GGC ACG GAC TTC TGG TAT GAG TAT CGA ACT CCC ATT CAA GAG AAT TCA ACT 195 

DSNKIAWEDFLGDEADEKTY 64 
GAC TCG AAT AAA ATC GCC TGG GAA GAT TTC CTC GCT GAC GAG GCG GAT GAG AAG ACT TAC 255 

MDVLFRYNGSLGLWRRCITI 84 
AAC GAT GTT CTG TTC CGA TAC AAC GGC AGC TTG GGG CTG TGG AGA CGG TGC ATC ACC ATA 315 

PXNTHWYAPPERTESFDVVT 104 
CCC AAA AAC ACT CAC TGG TAT GCG CCA CCG GAA AGG ACA GAG TCA TTT GAT GTG GTT ACC 375 

KCMS FT LNEQF'ME K YVDPGM 124 
AAA TGC ATG ACT TTC ACA CTA AAC GAG CAG TTC ATG GAG AAG TAT GTG GAC CCC GGC AAC 435 

HNSG ID LLRTYLWRCQFLLP 144 
CAC AAT AGC GGC ATC GAC CTG CTT CGC ACC TAC CTG TGG CGC TGC CAG TTC CTT TTA CCC 495 

FVSLCLMCFGALIGLCACIC164 
TTC GTC AGC TTG GGC TTG ATG TGC TTT GGG GCG TTG ATT GGC CTC TGT GCC TGT ATC TCC 555 

RSLY PTLATGILHLLAGLCT 184 
CGC AGC CTG TAT CCC ACC CTC GCC ACT GGC ATT CTC CAT CTC CTT GCA GCT CTG TGC ACA 615 

CGSVSCYVAGIEIiLHQKVEL 204 
CTG GGC TCC GTG ACT TCC TAT GTT GCC GGC ATT GAA CTC TTA CAT CAG AAA CTA GAG CTG 675 

PKDVSGEFGWSFCLACVSAP 224 
CCC AAG GAT CTA TCT GGA GAA TTT GGA TGG TCC TTC TGC CTG GCC TGC GTC TCG GCT CCC 735 

LOFMAAALFIWAAHTNRKEY244 
TTA CAC TTC ATG GCG GCC GCT CTC TTC ATC TGG GCT GCC CAC ACC AAC CCG AAA GAG TAC 795 

T LMKAYRVA* 2S4 
ACC TTA ATG AAG GCT TAT CGT GTC GCA TGA 825 

AGGGAGGCTGCCTGC'ri'AATGATTAATA' rrrr TCATACA 1 1 ITTTT 871 
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HUMAN TANGO 215 

Input tile tag215; Output Film tag215.pat Saquance length 2747 

MELGCWTQLG 10 

TCCCCAGTAGACGCTCCGGCACCAGCCGCGGCAAGG ATG GAG CTG GGT TGC TGG ACG CAG TTG GGG 66 

LTFLQLLLISSLPREYTVIlf 30 

CTC ACT TTT CTT CAG CTC CTT CTC ATC TCG TCC TTG CCA AGA GAG TAC ACA GTC ATT AAT 126 

EACPGAEWNIMCRECCEYDO SO 

GAA GCC TGC CCT GGA GCA GAG TGG AAT ATC ATG TGT CGG GAG TGC TGT GAA TAT GAT CAG 186 

IECVCPGKREVVGYTIPCCR 70 

ATT GAG TGC GTC TGC CCC GGA AAG AGG GAA GTC GTG GGT TAT ACC ATC CCT TGC TGC AGG 246 

NEENECDSCLIHPGCTIFEN 90 

AAT GAG GAG AAT GAG TGT GAC TCC TGC CTG ATC CAC CCA GGT TGT ACC ATC TTT GAA AAC 306 

CKS CRMGSWGGTLDD FYVKG 110 

TGC AAG AGC TGC CGA AAT GGC TCA TGG GGG GGT ACC TTG GAT GAC TTC TAT GTG AAG GGG 366 

FYCAECRAGWYGGDCMRCGO 130 

TTC TAC TGT GCA GAG TCC CGA GCA GGC TGG TAC GGA GGA GAC TGC ATG CGA TGT GGC CAG 426 

VLRAPKGQI LLESYPLNAHC 150 

GTT CTG CGA GCC CCA AAG GGT CAG ATT TTG TTG GAA AGC TAT CCC CTA AAT GCT CAC TGT 486 

EMTI HAKPGFVIQLRFVMLS 170 

GAA TGG ACC ATT CAT GCT AAA CCT GGG TTT GTC ATC CAA CTA AGA TTT GTC ATG TTG AGC 546 

LEFDYMCQYDYVEVRDGONR 190 

CTG GAG TTT GAC TAC ATG TGC CAG TAT GAC TAT GTT GAG GTT CCT GAT CGA GAC AAC CCC 606 

DGQIIKRVCGNERPAPIQSI 210 

GAT GCC CAG ATC ATC AAG CCT GTC TGT GGC AAC GAG CGG CCA GCT CCT ATC CAG AGC ATA 666 

GSSLKVL PHSOGSKNFDGFK 230 

GGA TCC TCA CTC CAC GTC CTC TTC CAC TCC GAT GGC TCC AAG AAT TTT GAC GCT TTC CAT 726 

AIYEEITACSSSPCFHDGTC 250 

GCC ATT TAT GAG GAG ATC ACA CCA TGC TCC TCA TCC CCT TGT TTC CAT GAC GGC ACG TGC 786 

V LD K AGS Y K CA C LAG YTGQ R 270 

GTC CTT GAC AAG GCT CGA TCT TAC AAG TGT GCC TGC TTC GCA GCC TAT ACT GGG CAG CCC 846 

CENLLEBRNCSDPCGPINCY 290 

TGT GAA AAT CTC CTT GAA CAA AGA AAC TGC TCA GAC CCT GGG GGC CCC ATC AAT GGG TAC 906 

QKITGGPGLINCRHAKIGTV 3X0 

CAG AAA ATA ACA CGG GCC CCT GGG CTT ATC AAC CGA CCC CAT GCT AAA ATT GGC ACC GTT 966 

VSFFCYNS YVLSGNEKRTCQ 330 
GTG TCT TTC TTT TGT TAC AAC TCC TAT GTT CTT ACT GGC AAT GAG AAA AGA ACT TGC CAG 1026 

QNGEWSCKQP ZC IKACREPK 3S0 
CAG AAT GGA GAC TCG TCA CGG AAA CAG CCC ATC TGC ATA AAA GCC TGC CGA CAA CCA AAG 1086 

ISDLVRRRVLPMQVOSRETP 370 
ATT TCA GAC CTG GTG AGA AGC AGA CTT CTT CCC ATG CAG GTT CAG TCA AGG CAG ACA CCA 1146 
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LHQLYSAAFSKOKLQS APTK 390 
TTA CAC CAC CTA TAC TCA GCG CCC TTC AGC AAG CAG AAA CTG CAG ACT GCC CCT ACC AAG 1206 

K PA L P FGOL PMGY'Q H L HTQ L 410 
AAG CCA GCC CTT CCC TTT GGA GAT CTG CCC ATG GGA TAC CAA CAT CTG CAT ACC CAG CTC 1266 

QygCISPFYRRLGSSRRTCL 430 
CAG TAT GAG TGC ATC TCA CCC TTC TAC CGC CGC CTG GGC AGC AGC AGG AGG ACA TGT CTG 1326 

RTGKWSGRAPSCIPICGKrE 450 
AGG ACT GGG AAG TGG ACT GGG CGG GCA CCA TCC TGC ATC CCT ATC TGC GGG AAA ATT GAG 1386 

NITAPKTQGLRWPWQAAIY R 470 
AAC ATC ACT GCT CCA AAG ACC CAA GGG TTG CGC TGG CCG TGG CAG GCA GCC ATC TAC AGG 1446 

RTSGVH DGSLHKGAWFLVCS 490 
AGG ACC AGC GGG GTG CAT GAC GGC AGC CTA CAC AAG GGA CCG TGG TTC CTA CTC TGC AGC 1506 

GALVNBRTVVVAAHCVTDLG 510 
GGT GCC CTG GTG AAT GAG CGC ACT GTG GTG GTG GCT GCC CAC TGT GTT ACT GAC CTG GGG 1566 

KVTMIKTADLKVVLGKFYRD 530 
AAG GTC ACC ATG ATC AAG ACA GCA GAC CTG AAA GTT GTT TTG GGG AAA TTC TAC CGG GAT 1626 

DDRDBKTIQSLQI5AX I L H P 550 
GAT GAC CGG GAT GAG AAG ACC ATC CAG AGC CTA CAG ATT TCT GCT ATC ATT CTG CAT CCC 1686 

MYDPILLDADIAILKLLDKA 570 
AAC TAT GAC CCC ATC CTG CTT GAT GCT GAC ATC GCC ATC CTG AAG CTC CTA GAC AAG GCC 1746 

R ISTRVQPICLAASRDLSTS 590 
CCT ATC AGC ACC CGA GTC CAG CCC ATC TGC CTC GCT GCC ACT CGG GAT CTC AGC ACT TCC 1806 

FQESHITVAGWNVLADVRSP 610 
TTC CAG GAG TCC CAC ATC ACT GTG GCT GGC TGG AAT CTC CTG GCA GAC GTG AGG AGC CCT 1866 

GFKNDTLRSGVVSVVDSLLC 630 
CGC TTC AAG AAC GAC ACA CTG CGC TCT GGG GTG CTC ACT GTG GTG CAC TOG CTG CTG TGT 1926 

EEQHEDHGIPVSVTDMMFCA 650 
GAC GAG CAG CAT GAG GAC CAT GGC ATC CCA GTG ACT GTC ACT GAT AAC ATG TTC TCT GCC 1986 

SWBPTAPSDICTABTGGIAA 670 
AGC TGG GAA CCC ACT CCC CCT TCT GAT ATC TCC ACT CCA GAG ACA GGA GCC ATC GCG GCT 2046 

VSFPGRASPBPRHHLMGLVS 690 
GTC TCC TTC CCG CGA CGA GCA TCT CCT GAC CCA CCC TGG CAT CTG ATC GCA CTG GTC AGC 2106 

WSYDKTCSHRLSTAFTKVLP 710 
TGG ACC TAT GAT AAA ACA TGC ACC CAC AGG CTC TCC ACT GCC TTC ACC AAC GTG CTG CCT 2166 

FKDW IERNMK* 
TTT AAA GAC TGG ATT GAA AGA AAT ATG AAA TGA 

ACCATGCTCATGCACTCCTrGAGAA C Tt J IT T CT CT ATATCCCTCT 2278 
GAAGTGTGATTITXJCCTCrTXyVACTrGGCT 2357 
CCTCCATTCCTCCTACGCTCATTCCVCCTC^ 24 36 
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tttcttcaaagaagaccatatacaAaacct^ 2515 

GCCATCACXITTGACCAGGGAAGATCTG 2594 
GACAGCCCAGGGCAGCAGAGCTGGGATGTCGTGCATGCC r riX j ZG7j 
CCCCATCTCTTCTACACATTTTAATAAAATAAGGGTTGGCI TL 1U AACTACAAAAAAAAAAAAAAAAAAAAAAA 2747 
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GTCGACCCACGCGTCCGGCGGCTAGGCCXrCCGTGCGCTGGAGACCTCCGCGCT 79 

MGGPRGAGWVAA 12 
CCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG 145 

GLLLG AGACYCIYRLTRGRR 32 
GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG 205 

RG DRELGIRSSKSAGALEEG 52 
CGG GGC GAC CGC GAG CTC GGG ATA CCC TCT TCG AAG TCC GCA GGT GCC CTG GAA GAA GGG 265 

X SEGQLCGRSARPQTGGTWE 72 
ACG TCA GAG GGT CAG TTG TGC GGG CGC TCG GCC CGG CCT CAG ACG GGA GGT ACC TGG GAG 325 

SQWSKTSOPEDLTDGSYDDV 92 
TCA CAG TGG TCC AAG ACC TCG CAG CCT GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT 385 

LMAEQLQ KLLYLLESTEDPV 112 
CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA 445 

IIERALITLGNNAAFSVNQA 132 
ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT 505 

IIRELGGIPIVANKIMHSNQ 152 
ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG 565 

SIKEKALNALMMLSVMVENQ 172 
ACT ATT AAA GAG AAA CCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA 625 

XICIKIYISOVCEDVFSGPLN 192 
ATC AAG ATA AAG ATA TAC ATC AGT CAA GTA TGT GAG GAT CTC TTC TCT GGT CCT CTG AAC 685 

SAVQLAGLTLLTNMTVTNDH 212 
TCT GCT GTG CAG CTG GCT GGA CTG ACA TTG TTG ACA AAC ATG ACT GTT ACC AAT GAC CAC 745 

QHMLHSY iTDLFQVLLTGNG 232 
CAG CAC ATC CTT CAC AGT TAC ATT ACA CAC CTG TTC CAG GTG TTA CTT ACT GGA AAT GGA 805 

NTKV QVLKLLLNLSENPA MT 252 
AAC ACG AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTC TCT CAA AAT CCA GCC ATG ACA 865 

EGLL RAQVDSSFLSLYDSHV 272 
CAA GGA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT TCC CTT TAT GAC AGC CAC GTA 925 

AKEILLRVLTLFONIKNCLK 292 
CCA AAC GAG ATT CTT CTT CCA GTA CTT ACC CTA TTT CAC AAT ATA AAG AAC TGC CTC AAA 985 

IEGHLAVQPTFTEGSLFFLL 3X2 
ATA GAA GGC CAT TTA GCT GTG CAC CCT ACT TTC ACT CAA GGT TCA TTC TTT TTC CTG TTA 1045 

HGEECAQKI RALVDHHDAEV 332 
CAT CCA CAA GAA TGT GCC CAG AAA ATA AGA CCT TTA GTT GAT CAC CAT CAT GCA GAG GTG 1105 

KEKVVTI IPKI • 344 
AAG CAA AAC GTT CTA ACA ATA ATA CCC AAA ATC TGA 1141 

TTCXTTCATATTTTTCCAAAGAGTAATCCAGTCTGGATA 1220 
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CTGCTAAATTTAAACAGTAAATATCAC ATl ' l rGTC ATTAAO^CACCTATAACTTGC CG T CC T TCT C A GATT TO 1299 
ACTArTTIGATGCCAACTGAATATAAGAGCTlCT rTCTl rC T ATTI rGCTATTTGC AAATCCTT 1378 

GTT A l' L Tr C CCTACATGAAGTCCCACTAAC ^ ^ 1457 
ACTCATCTGAGACAGQ^TCAGTATTTGACTAAATCA l rCTTTC ACAACTGAATAGT C 1 ibi 1 CTTTTA GTAGCAATGAA 1536 
ATCCTAAGCnrrT G AGGCCATTCACCTGGCA 1615 
TTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGG 1694 
TACATATAAAATAGTGTGATCAATCACAATGTCCA TC rTTA GACA GTTOGTTA AATAAATTATC 1 GG rC TTT GA AAAGA 1773 
O CCIGCTGGGOSOGGTGGCTCTTGCCTCTAATCOCft 1852 
GTTTG AGACCftAGCCTCACCAATATGGAGAAA CC C^ 1931 
GCCTtrrAATCCCAGCTACTTGGGAGGCOGAGGCAGGAGAATTC 2010 
ATAGCGCCATTGCACTCCAGCCTCGGCAACAAGAGCAAAACT 2089 
TGTGCTTAAGTGGAAAGATATCTATGAAATA'l GG TGG 1 TTTTTAAAACACAAAAATTATAGAATA7GGGATCCCCTGTG 2168 
TGT G TGTCTGTGTGTGTGTGlVITjlxrr C 'rGTGTGTGTCT 224 7 

CTAGAATGATACCCAAACTCCTGGAGTGGGAGTCGGGAATG CC T TCTA Q I TGAATTTTTT 2326 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGC^ 2403 
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TCCGCTCCMiGAAAAAG CTCCT T G CA^ 79 

GGCTTCCGATTTTAGCAGGGC GG ITI CCGGAAGGCGGACCTCCAACCCCA'lTTCC'l TTCTCTGGGCTGGT TCTGCCCCA 158 

M G G A R S 

GCTGCA Q rr CC GT G TGGCCCIXXXrrCCTCGGCTCCCTGCAG ATG GGT GGC GCG CGG 229 

DVGWVA AGLVLGAGACYCIY 25 

GAC GTG GGC TGG GTG GCA GCA GGG CTG GTC CTG GGC GCC GGC GCC TGC TAC TGT ATC TAG 289 

RLTRGPRRGVATMRPSRSAE 45 

CGG CTG ACT CGG GGA CCG CGG CGA GGC GTC GCG ACC ATG CGC CCT TCG CGA TCC GCA GAA 349 

DLTDGSYDDILNAEQLKKLL 65 

GAC CTA ACC GAT GGC TCC TAT GAC GAT ATC TTA AAT GCA GAG CAG CTT AAG AAA CTT CTG 409 

Y L L E STDDPVITE K ALVTLG 85 

TAT CTG CTG GAG TCA ACC GAC GAT CCT GTC ATT ACT GAA AAG GCC TTG GTC ACC TTG GGA 469 

KNAAPSTN. QAIIRBLGGIPI 105 

AAT AAT GCA GCC TTC TCC ACT AAC CAG GCC ATT ATT CGT GAG TTG GGT GGT ATC CCA ATT 529 

VG NK I NSLNQS I KEKAtiNAL 125 

GTT GGA AAC AAA ATC AAC TCC CTG AAC CAA ACT ATT AAA GAG AAA GCT TTA AAT GCA CTG 589 

H NLSVNVEN QT. KIKIYVP0V145 

AAT AAC CTG ACT GTG AAT GTT GAA AAT CAA ACT AAG ATA AAG ATA TAC GTC CCT CAA GTC 649 

C E D V FAD 
TGT GAG GAC GTC TTT GCT GAC 
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10 20 30 40 50 

HUNUM MALLSRPALT LLLLLMAAWRCQEQAQTTDWRATLKTIRN^ 

, m m • • • • * ■ • • 

v v « ■ • a • • • ■ ■ ■ »••••••••••*••••••*•-•»•*••••••••• 

MuRlWE M-VTPP.PAPARGPALLLLLLLATARGQEQDQTTDWRATLKTIRNGIKXIDTYLN^ 
10 20 30 40 50 

60 70 80 90 100 110 

GGEDGLCQYKCSDGSKPFPRYGYKPSPPNGCGSPLFGVHLNIGIPSLTKCCNQHDRCYET 

GGEDGLCQYKCSDGSKPVPRYGYKPSPPNGCGSPLFGVHLNIGIPSL7KCCNQHDRCYET 
60 70 80 90 100 110 

120 130 140 150 160 170 

CGKSKMDCDEEFQYCLSKICRDVQKTLGLTQHVQACETTVELLFDSVIHLGCKPYLDSQR 
«••••■•»*■•■>>•*•»••••••••••• • 

.,««»»...............•••«•••«•••••«•»•»•••*««*••••••»•«•«««• 

CGKSK^IDCDEEFQYCLSKICRDVQKTLGLSQNVQACETTVELLFDSVIHLGCKPYLDSQR 
120 130 140 150 160 170 

180 190 
; AACRCHYEEKTDL 



AACWCRYEEKTDL 
180 190 
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10 20 30 40 50 60 

Mu*i P€ MAQLGAWAVASSFFC^SLFSAVHXIEEGHIGVYYRGGALLTSTSGPGFKLMLPFITSYK 

* MAQLGA WAVASo FFCA5LFSAVHKIEEGH IGVYYRGGALLTSTSGl*G FHLML P F I TS YK 
P 10 20 30 40 50 60 

70 80 90 100 110 120 

SVQTTLQTDEVIQP/PCGTSGGVM 

SVQTTLQTDEVKHVPCGTSGGVMIYFDRI.^^ 

70 30 90 100 110 120 

130 140 150 160 170 130 

KKHIi^QFCSVHTLQEVYIELFDQIDENLKLALQQDLTSMAPGLVIQAVRVTKPNIPEAIR 

.«•••••••••■•••*•• ••••••••••••••••••• 

HHELNQ FCSVHTLQEVYIELFDQIDENLKLALQQDLTSMAPGLVIQAVRVTKPNI PEAIR 

130 140 150 160 170 180 

190 200 210 220 230 240 

R^/ELMESEKTKLLIAAQKQKVVEKEAETERKKALIEAEKVAQVAEITYGQKVMEKETEK 

;;;..»««aaaeaaaaaaaaaaaaaaaa»aaaaaa»aaaaaaaa»aaa 

RNYELMESECTKLLIAAQKQKV\^KEAETERKKALIEAEKVAQVAEITYGQKVMEKCT 

190 200 210 220 230 240 
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10 20 30 40 50 60 

IWIAU MNMTQARVLVAAWGLVAVLL YAS IHKI EEGHLAVYYRGGALLTSPSGPGYHIML ?FITT 

MURINE 



70 80 90 100 110 120 

FRSVQTTLQTDE^/KWPCGTSGGVMIYIDRI 

KNVPCGTSGGVM I Y I DR I EWNMLA P Y A VF D I VRNYT AD YDKTL I FN 

10 20 30 40 

130 140 150 160 170 130 

KIHHELNQFCSAHTLQEVYI ELFDQIDENLKQALQKDLNLMAPGLTIQAVRVTKPKI PEA 

»•••*•«•••••••»••••••••••••••••*•••••*• •••••••••••••••••••• 

«*»»»•*»■•*•««••*•••»*•••••■••••••••••• *•■•••••••■••««••••■ 

KIHHELNQFCSAHTLQEVYIELFDQIDENLKQALQKDLNTMAPGLTIQAVRVTKPXIPEA 

50 60 70 80 90 100 

190 200 210 220 230 240 

IRRNF ELMEAEKTKLLIAXQKQKWEKEAETERKKAVIEAEKIAQVAKIRFQQKVMEKET 
..........«........»♦«.»•••»••••••••••«♦•••••••••»•••••••••• 

IRRNr ELMEAEKTKLLIAAQKQKVVEKEAETERKRAVIEAEKIAQVAKIRFQQKVMEKET 
110 120 130 140 150 160 

250 260 270 280 290 300 

EKRISEIEDAAFLAREKAKADAEYYAAHKYATSNKHKLTPEYLELKKYQAIASNSKIYFG 



EKRISEIEDAAFLAREKAKADAEYYAAHKYATSNKHKLTPEYLELKKYQAIASNSKIYFG 
170 180 190 200 210 220 

310 320 330 340 

SNIPNMFVDSSCALKYSDIRTCRESSLPSKEALEPSCENVIQNKESTG- 

!!!!•!•••*••••'•*• >■ •••••• ■■>■•••■ 

SNIPSMFVDSSCALKYSDGRTGREDSLPPEEAREPSGESPIQNKENAGN 
230 240 250 260 270 
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10 20 30 40 SO 60 

MUfcl K>€ MKIXCLVAVVGCLLVTPAQANKSSE^IRCKCICPPYRN^ 
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Humam mkli>slvavvgcllvppaeankssex)Irckcicppyrotsghiynqwsqkix:nclhvve 

10 20 30 40 SO 60 

70 80 90 100 110 120 

PMPVPGHDVEAYCIXCECRYEERSTTTIKVIIVIYLSWGALLLyMAFLMLVDPLIRKPD 

• •••*••■•••••••••••••••••••••*•#•■•••*•••«••••*•*••*«»...;, i 

PMPVPGHDVEAYCLLCECRYEERSTTTIKVIIVTYLSVVGALLLYMAFLMLVDPLI 

70 80 90 100 110 120 

130 140 150 160 170 180 

AYTEQLHNEEENEDARTMATAAAS IGG PRA1TIVLERVEGAQQRWKLQ VQEQRKTVTDRHK 

- • •••» ■ •••■•••••* aaaaaaa > aa ;» a « a « aaaaaaaaaa 

• •••*••••••••••••■•••*••••••«■•■•*»■*••*»•*••••*•*•••***■(•« 

A YTEQLHNEEENEDARSMAAAAASLGG PRANTVLERVEGAQQRWKLQVQEQRKTVFDRHK 
130 140 150 160 170 180 



MLS 
MLS 
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10 20 30 40 50 

HOM A N MATLW-GGLLRLGSLLSLSCLALSVLL LAQLSDAAKNFEDVRCKCICPPYKENSGHIYNK 

MMBll 4 : : . : : : : : : : : : . : : : : : : 

nUKI K,C MASLWCGNLLRLGSGLSMSCLALSVLLLAQLTGAAXNFEDVRCKCICPPYKENPGHIYNK 
10 20 30 40 50 60 

60 70 80 90 100 N 110 

NISQKXXTDCLHWEPMPVRGPDVEAYCLRCE^ 

nisqkix:ix:lhvvepmpvrgpdveayclrceckyeerssvtikvtiiiylsilgli^lym 

70 80 90 100 110 120 

120 130 140 150 160 170 

VYLTLVEPILKRRLFGHAQLIQSDDDIGDHQPFANAHD^ 

VYLTLVEPILKRRLFGHSQLLQSDDDVGDHQPFANAHDVIJ^ 

130 140 150 160 170 180 

180 190 

KLQVQ EQRKSVFDRHWLS 



KLQVQEQRKSVFDRHWLS 
190 
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10 20 30 40 50 60 

HU M MIRCGIACERCRWILPIJ,LI^AIAFDIIAIJVGRGWI^SSDHGQTSSLWWKC^ 

M C/f*J ME MU*CGIACERCRWILPLLI.T..S;^^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

YEEGCQSLMEYAWGRAAAAMLFCGFIILVICFII^ 

- •••••••• ■••••••■■■»•■•••••••■••••••••• 

YDDGCQSLMCTAWGRAAAATLFCGFIILCICFILSFFALCG^ 

70 80 90 100 110 120 

130 140 ISO 160 170 180 

FQI ISLVI YPVlOHtXTCFTLHANPAVTYI YNWAYG FGWAATI ILIGCAFFFCCL PNYEDDL 

••••••••••••••••• • • • • •••••••••••»••«•••■• 

••••••••••«••••«•••••••••••«••••••»■•••••« 

FQ I ISLVI VP VKYTQTFRIJIDNPAVNYI YNWAYGFGWAATI XL XGCS FFFCGLPNYEDDL 

130 140 150 160 170 180 

190 

LGNAKPRYFYTSAN 



LGAAKPRYFYPPAN 
190 
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10 20 30 40 50 

MO© NG MAGIPGL-FILLV^LCVTMQVSPYTVPW^ 

• » • « • • * • • • • ■ ••■••■>•••>•••••*••■*•*•• * - * * • * • 

•>•>•« •>•••> • »••••••••••**•* • • V • 

MO K AN MAGI PGLLF LLFFLLCAVGQVS? Y3APWKPTWPA YRLPWLPQSTLNLAKPDFGAZAKLE 

10 20 30 40 50 60 

60 70 80 90 100 110 

VSSSCGPQCHKGTPLPTYEEAKQyLSYETLYANGSRTETRVGIYILSNGEGRARGRDSEA 

VSSSCGPQCHKGTPLPTYSEAKQYLSYETLYANGSRTETQVGIYILSSSGDGAQKKDSGS 
70 80 90 100 110 120 

120 130 140 150 160 170 

TGRSRRKRQIYGYIX^FSIFGXDFLLNYPFSTSVKLSTGCTGTLVA£KH\^TAAHCIHTC 
. •«••••••••• »•»••••••••••»•••••••••••••«••••••••••••••••• 

SGKSRRKRQIYGYDSRFSIFGKDFLLNYPFSTSVKLSTGCTGTLVAEKHVLTAAHCIH^ 
130 140 150 160 170 180 

180 190 200 210 220 230 

KTYVXGTQKLRVGFLKPKYKDGAEGDNSSSSAMPDKMKFQWIRVKRTHVPKGWIKGNAND 

..•••■•■••••■■•■■a ■ • • • • * a ■ • ■ ..».•»..«.»••••.••»••»•• 

■ ■■••••••••••••■••••••a • ••••••«•«»•»•»•«•••«■•••••••••■••*• 

KTYVKGTQKLRVGFLKPKFKDGGRGANDSTSAM P EQMKFQWI RVKRTHVPKGWI KGNAND 
190 200 210 220 230 240 

240 250 260 270 280 290 

IGMDYDYALLEXFCKPHKRQFmiGVSPPAKQLP^RIHFSGYDNDRPGNLVYRFCDVKDE 

■ •■•••••••■••••••a »aaaaaaaaa««Baaaa»«aaaaaaaaaaeaaa»«aaBa»a 

»,»,«».»«»...».».»••»»•••»•••»••■•«•••«•••»••■•••••••••••••• 

IGMDYDYALLELKKPHKRKFMKIGVSPPAKQLPGGRIHFSGYDNDRPGNLVYRFCDVKDE 
250 260 270 280 290 300 

300 310 320 330 340 350 

TYDLLYQQCDAQPGASGSGVYVR1WKRPQQKWERKIIG 

• ••■•*>••*•*>■*••••••••••*• aaaaaaaaaaaaaaaaaaaaaaaaajjaaaaa 

,«»,»»••••••••«**■•••••••••»•••»••*••••■•••••••••«••••••••*■ 

TYDLL YQQCDAQPGASCSGVYVRMWKRQQOKWERKI IG I FSGHQWVDMNGSPQDFMVAVR 
310 320 330 340 350 360 

360 370 380 

ITPLKYAQICYWIKGNYLDCREC 



ITPLKYAQ ICYWI KGiTfLDCREG 
J70 380 
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HUM>«*0 



10 20 30 

MAPASR LLALWALAAVAL PGSGAEG DGGWRPGG PG - 



40 50 
-AVAEE2RCTVERRADLT 



MAAAGRRGLLLXrVLWMMVTVILPAS - - -GEGGWKQNGLGIAAAVMEEERCTVERRAHIT 
10 20 30 40 50 



60 70 80 90 100 110 

YAEFVQQYAFVRPVILQGLTDNSRFKALCSRD^ 

YSEFMQHYAFLKPVrLQGLTDNSKFRALCSRENLLASFGDNIVRLSTANTYSYQKVDLPr 
60 70 80 90 100 110 

120 130 140 150 160 170 

QEYVEQLLHPQDPTSLGNDTLYFFGDNNFTEWASLFRHYSPPPFGLLGTAPAYSFGIAGA 

QEYVEQLLQPQDPASLGNDTLYFFGDNNFTEWASLFQHYSPPPFRLLGTTPAYSFGIAGA 
120 130 140 150 160 170 

ISO 190 200 210 220 230 

GSGVPFHWHGPGYSEVIYGRKRWFLYPPEKTPEFHPNKTTLAWLRDTYPALPPSARPLEC 

GSGVPFHWHGPGFSEVIYGRKRWFLYPPEKTPEFHPNKTTLAWLLEIYPSLALSARPLEC 
180 190 200 210 220 230 

240 250 260 

TIRAGEVLYFPDRWWHATLNLDTSVFISTFLG 



TIQAGEVLYFPDRWWHATLNLOTSVFISTFLG 
240 250 260 



F/g. ^ < j 
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10 20 30 40 50 60 

HOW*i^ MD^FATAFVIACVLSLISTim^ASICTDFOT 



ftUtoWZ MDNRFATAFVIACVLSLISTIY^^SIGTDFWYEYR^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

EKTYNDALFRYNGTVGLWRRCITIPKNMHWYSPPERTESFDVVTKCVSFTLTEQFMEKF^ 

EKTYNDVL F R YNGS LGLWRRC IT I P KNTHWYAP PERTESFDVVTKCMSFTLNEQFMEKYV 
70 80 90 100 110 120 

130 140 150 160 170 180 

DPGNHNSGIDLLRTYLWRCQFLLPFVStGLMCFGALIGLCACICRSLYPTIATGILHLLA 



DPCNHNSGIDLLRTYLIVRCQFLLPFVSLGLMCFGALIGLCACICRSLYPTLATGILHLLA 
130 140 150 160 170 180 

190 200 210 220 230 240 

GLCTLGSVSC YVAG I ELL HQKL EL PDNVSGEFGWSFCL ACVSAP LQ FMASALFIWAAHTN 



GLCTLGSVSC YVAG I ELLHQKVEL PKDVSGEFGWSFCLACVSAPLQFMAAAL F IWAAHTN 
190 200 210 220 230 240 

250 

RKEYTLMKAYRVA 



RKEYTLMKAYRVA 
250 



PIG. 30 
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10 20 30 40 50 

NO £1)06 MGGARDVGWVAAGLVLGAGAC YC I YRLTRG PRRGVATM — RPSRSAEDLTDGSYDDILNA 

: : : - :.:.::::::::::::.::: 
HUKAM MGGPRGAGW^/AAGLLLGAGACYCIYRLTRGRRRGDEIELGIRSSKSAEDLTDGSYDDVLNA 
10 20 30 40 50 60 

€0 70 80 90 100 110 

EQLKKLLYLLESTDDPVITEKAtVTLGNNAAFSTNQAIIRELGGIPIVGNKINSLNQSIK 

EQLQKLLYLLESTEDP VI I ERALITLGNNAAFSVNQAI IRELGGI PI VANKINHSNQSIK 
70 80 90 100 110 120 



120 130 140 150 

EXALNALNNLSVNVENQTKI K I YVPQ VCEDVFA - 



EKALNALNNL SVNVENQ IKIKIYISQVC ED VF SG P LNS A VQ LAG LT L LTNMTVTNDHQ HM 
130 140 150 160 170 180 



LHSYITDLFQVVLTGNGNTKVQXLKLLLNLAENPAMTEGLLRAQVDSSFLFLYDXHVAXE 
190 200 210 220 230 240 



XLLQYLRFSE 
250 



Fl6>. 31 
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bumutntallgn 

ALIGN calculates a global alignment of two sequences 

version 2-OuPlease cite: Myers and Killer, CABIOS (1989) 

> mutlSO 1570 aa vs. > fautlSO 

1203 aa scoring matrix: paml20.mat, gap penalties i -12/ -4 
55.0% identity; Global alignment score: 2219 



10 20 30 40 50 

GTCG ACCCACGCGTCCG - - -GGCCGGGGTCCTOA GCCGGAGCCGGAGCGCGCGCC 

• *»****•»«••«•»** • • mm ■•• * * ♦ * « * • 

• a m • m • • «••• • • • *•»•• 

GTCGACCCACGCGTCCGCGTGGATATGGACCTGGCTGCTGCCAAGTCCGG 
10 20 30 40 50 60 



60 70 80 90 

GCTGCCCAGC CC CGC - - - CGCGCCG-GCCCCGCAGAT -GGTGACT 



CCTGCCTAGCCCGTCCTGGGGACTCTGTGGGCACG 

70 80 90 100 110 120 



100 110 120 130 

C CGCGGCCCCC- - -GCCC-GCCCCGG-GCCCCGCGCTC- — CTCCTCCT 

: ::: :: ::: :::: :: : : ::::::::::: ::::: :: 

130 140 150 160 170 180 



140 150 160 170 180 190 

CCTGCTGCTCGCCACTGCGCGCGGG- - - CACGAACAGGACGAGACCACCGACTGGAGGGC 

CCTCCTCATGGCCGCTCTTGTCAGGTGCGAGGAGCAGGC 

190 200 210 220 230 240 

200 210 220 230 240 250 

CACCCTCAAGACCATCCCCAACGGCATCCACAAGATACLACA 



CACCCTGAAGACCATCCCGAACGGCGTTCATAAGATAGACACGTACCTGA 
250 260 270 280 290 300 

260 270 280 290 300 310 

GGACCTGCTGGGCGGGGAGGACGGGCTCrcCCAGTACAAGTGCACCGACGGATCGAAGCC 



GGACCTCCTGGGAGGCCAGGACGCTCTCTGCCAGTATAAATGCAGTC 

310 320 330 340 350 360 

320 330 340 350 360 370 

TGTTCCACGCTATGGATATAAACCATCTCCACCAAATGGCTGTC 

: : : : : : : :::::.:::::::: : : :::::::::::::::::::: 
TTTCCCACCTTATGGTTATAAACCCTCCCCACCGAA 

370 380 390 400 410 420 

330 390 400 410 420 430 

CCTTCATCTGAACATACGTA r CC CTT CCC I CACCAACTGC7GCAACCACCACCACAGATG 



TCTTCATCTTAACArrGCTATCCCrrCCCTGACAAACTGI 1 :i;CAACCAACACGACAGCTG 
430 440 450 460 470 480 

440 450 460 470 490 490 

CTATGACACCrcCCGCAAAAGCAAGAACGACTGTCACGAGGAGTTC 

rrATGACACCTCTCGCAAAACX^ACAATCACT^ 

4>0 500 510 520 530 540 
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500 S10 520 530 540 550 

CAAGATCTGCAGAGACGTGCAGAAGACGCTCGGACTATC^^ 

CAAGATCTGCCGAGATGTACAGAAAACACTAGGAC^ 

550 560 570 560 590 600 

560 570 580 590 600 6X0 

GACAACGGTGGAGCTCCTCTTTGACAGCGTCATC 

: :::::::: : : : : :::::::: : : : : : : : 

AACAACA<ntXjAGC7CTTGTTTGACAGTGTTATACATTO 
610 620 630 640 650 660 

620 630 640 650 660 

CAGCCAGCGGGCTGCATGCTGGTGTCGTTATGAAGAAAAAACAGA ACC 

CAGCCAACGAGCCGCATGCAGGTGTCATTATtlAAGAAAA 

670 680 690 700 7X0 720 



670 680 690 700 7X0 720 

CTGACTGCTGGAGAGCAGGCGAGAATGGAGGATCAT- CCTT - GCCAAAGATCGGATGCTT 

• • • • • • • ■ ■ • • • « • »»_••■ • 5 i • ■ 5 8 9 9 • 

i » ■ • . • •••• • • • • • • • • • a • • a • 

CCGACAGCTAGTGA- CAGATGAAGATGGAA3AACATACCTTTGACAAATAACTAA TGTTT 
730 740 7S0 760 770 



730 740 750 760 770 780 

TAACAGCCTAATUTl UCCTTAG'ITT'ltJTli 1'CGATGGGTCA i i l i uAGACu i u i 1 ^i'ATACT 

: : : . . : : : : : : ::::::: . : : : 

TTACAACATAAAA C T GTC T TATTTT TCTC - - AAAGGATTA I I T ' 1G AGACCTTAAAATA- - 
780 790 800 8X0 620 830 



790 600 8X0 820 830 840 

CTCTCTTTTn r AGAACCTCAAAGTGAAAACGGTGGGGGGCCAGGCAGAAACAGAGGGAG 

• •« • • • • • • • • • !f_!^.?«! 

• • • • • a a a ••••• • •••••«••• »•••♦»•« 

ATTTATAT CTTGATGTTAAAACCT CAAAGCAAAAAAAGTGAGGG 

840 850 860 870 



850 860 870 880 890 900 

AGC^TCCTTGGCATGCCGAGCGAGCAC^ 

AGATAG TGAGGGGAGGGCA- - -C CCTTGTCTTC 

880 890 900 



9X0 920 930 940 950 960 

rAAGCTCCTGTGACTTGGTGTTCAT 



- TCA - CCTATCTTCCCCA - 
910 920 



-GCATT-GCTC CCTTA CTT 

930 940 



970 980 990 X000 10X0 1020 

ACTTTGTACITAACAATAAAAATGAAAGCAAATGTAA T T TCAGC 

ACTA-TGC CAAATGT CTT 

950 

1030 1040 1050 1060 1070 1080 

ATTATTTTA , ' T T TGAAATACAC^CCAATCTTCCCTTAGAACTATTATTTATT'ri'CAAATT 



CACCAAT- ATC- - - AAAAACAAGTGCTTGTTTAG 

960 )?Q 980 
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1090 1100 1110 1120 1130 1140 

TCAGATGTACAT1TATACCTGGAAAAACTATTAATTCTCCATTTTTA 



- CGGA " GAA TTTTGA AAAGAGGAATA TATAACTCAATTTT 

990 1000 1010 1020 

1150 1160 1170 1180 1190 1200 

GTTGTTTCTCTGAAGCCCACTAAGATAGGTATAAATATGTrACTCAAAACTA 

CAC AAC--CACATTTA 

1030 1040 

1210 1220 1230 1240 1250 1260 

CCAAATGTGCATCTCTTGTACAGTTGGAATCACGGTTGCrrACTTCTCTC 



CCAAA AAAAGAGATCAAATATAAAATT 

1050 1060 

1270 1280 1290 1300 1310 1320 

CAGGACATCTGA GTG TT G GGATGTCCACAGAATTC 



CATCATAATGT - C TGTT - --CAACAT- - TATCT 

1070 1080 1090 

1330 1340 1350 1360 1370 1380 

ACC Li LIT AGACTTGAATCTCCTTO- il r CCIX*k-"i\i itiAGCl^r AGGAATGACGGu ill AAC 

TATTTG GAAAATGGGGAAATTATC 

1100 1110 

1390 1400 1410 1420 1430 1440 

CGCCCAACCCGACC l C iGAATCAGTG CGCTATCTG CTGCTGAGo 1 1 u i\KJTTACTCCCT C 



A - CTTACA AGTATTTGTTTACT 

1120 1130 1140 

1450 1460 1470 1480 1490 1500 

ATCCC CUTT T T CCATCTTCTATCCTGCACTA b 'l Xm I I aaaactctgacattttctaatgga 



ATGAAAT - TTTAAATAC - -ACATTT 

1150 1160 

1510 1520 1530 1540 1550 1560 

GvTrCTTAATAAAACCTATTTACCl CTTGCTAAAAAAAAAAAAAAAAAAAAAAAAAAGGGC 



ATGC CTAC AAAAAAAAAAAAAAAAAAAAAAAGCGC 

1170 1180 1190 

1570 
GGCCG- 



:zoo 
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Ht/MAM 



10 20 30 

TANGGATCGACCACGCGTYCGCCCACGCGT 

ACGCGTCCGCGGACGCGTGGGCGCGGACTGATGGCGTCATCGAAGCGACTGGCCCGGAAG 
10 20 30 40 50 60 

40 50 60 70 80 

CCGGTCGCGTGCTG AGGGGTGTGACGGTTT — TC - - TTGCTCGTGGGCTCGGACG AGTAC 

* : : v . : : : : ; : : : 

GAAGTAGGGTCCTCAGGGGTGTGGCGGTTTCTACGGTTGCACGGGGGTTCG^ 
70 80 90 100 110 120 

90 100 110 120 130 140 

GGAGCGCCTGCAGGGACAGCCTGGATAAAGGCTCACTGATGGCTCAGTTGGGAGCAGTTG 

:::::::::: : : : 

GGAGCGCCTGGAGGGACAGCCTGGATACAGGTTCACTGATGGCTCAGTTGGGAGCTGTTG 
130 140 150 160 170 180 

150 160 170 180 190 200 

TGGCTGTGGCTTCCAGTTTCTTTTGTGCATCTCTCTTCTCA 



190 



200 



210 



220 



230 



240 



210 220 230 240 250 260 

AGGGACATATTGGGGTATATTACAGAGGCGGTGCCCTGCTGACTTCGACCACCGGCCCrc 

AGGGACATATTGGAGTATATTACAGACGTCGTGCCCTGCTGACCTCCACCACTGGCCCGG 
250 260 270 280 290 300 

2r0 280 290 300 310 320 

GTTTCCATCTCATGCTCCCTTTCATCACATCATATAAGTCTGTGCAGACCACACTCCAGA 

GTTTCCATCTCATGCTCCCGTTCATCACA'rCCTATAAGTCTCTACAGACCACTCTCCAA^ 
310 320 330 3 10 350 360 

330 340 350 360 370 330 

CACATGAGGTGAAGAATGTACCTrGTGCGACTAGTGGTCGTCTGATGATCTACTTTGACA 



51/112 



CTGATGAAGTGAAGAACGTACCATGTGGAACCAGTGGTGGTGTGATGATCT 
370 380 390 400 410 420 

390 400 410 420 430 440 

GAATTGAAGTGGTGAACTTCCTGCTCCCGAACGCAGTGTATGATATAGTC 



GAATTGAAGTGGTGAACTTCCTGGTCCCAAATGCAGTGTATG 
430 440 450 460 470 480 

450 460 470 480 490 500 

CTGCTGACTATGACAAGGCCCTCATCTTC\ACAAGATCCACCACGAACTGAACCAGTTCT 

. • . . .«.•»«...,..••••«•»••■•••««•••••••■ «• .. ...»....•• 

• ....•••••.•••..■•«••••■••••"••-•••••••" • « • * • • « 

CTGCAGACTATGACAAGGCCCTCATCTTCAACAAGATCCATCATGAGCTTAACCAGTTCT 
490 500 510 520 530 540 

510 520 530 540 550 560 

GCAGTGTGCACACGCTTC.^GAGGTCTACATTGAGCTGTITGATCAGATTGATGAAAATC 

GCAGCGTTCATACTCTTCAGGAAGTCTAT.VTCG^ 
550 560 570 580 590 600 

570 580 590 600 610 620 

TCAAACTGGCTTTGCAACAGGACCTGACCTCCATGGCCCCTGGGCT 

■ • • • ...•.•»*•* .•••»•••••• .....•••*••....•■•«• .. •»*•*■• 

. . . « » ....... ......... ..«■■> .«•••••••.•••«»•«••- ... ....... 

TCAAGTTGGCTTTGCAGCAGGACCTGACTTCCATGGCCCCTGGGCTTC 

610 620 630 640 650 660 

630 640 650 660 670 680 

TGCGGGTAACAAAGCCCAACATACCAGAGGCAATCCGCAGAAACTACGAGTTGATGGAAA 

■ • • • .. •••••••■*•• •.«•• *•»••«»«•••».* ... 

..«•**•••••*•..••«• •«••»«•««•••«•«»••»•»••••• «•« 

TGCGAGTGACAAAGCCCAATATACCTGAGGCAATCCGCAGGAACTATGAGCTGATGGAAA 

670 680 690 700 710 720 

690 700 710 720 730 740 

GTGAGAAGACAAAGCTTCTCATTGCCGCCCAGAAACAGAAGGTGGTCGAAAAGGAAGCAG 
. «■••*.•• •..•••■«..■... •••••••• 

» ..«*..•.••••...•«....«« •••••••• »••••■■••••«. •....•«•«•■•• 

GCGAGAAGACGAACCTTCTCATTGCAGCCCAGAACCAGAAGGTGGTGGAAAAGGACGCAG 
730 740 750 • 760 770 780 

750 760 770 780 790 800 

AGACAGAGCGGAAGAACGCGCTCATTGAGGCAGAAAAAGTGGCCCAGGTCGCTGAGATCA 

. ».#*•. ■ •■•»»*•.. • ••••••••••••••••••>••• •«•*• . a •• •••• 

• «•».»•• ».*••«...• .•••»••»••»••»•»•»»»»•» •*..• • •«♦••»♦♦♦ 

AAACAGAGAGGAAGAAGGCCCTCATTGAGGCAGAAAAAGTCGCACAGGTTGCAGAAATCA 
790 800 810 820 830 840 

810 820 830 840 850 860 

CCTACGCGCAGAAGCTGATGCAGAAGGAGACTGAGAAGAAGATTTCACAAATTGAAGATG 



CCTATGGGCAAAAGGTGATGG AG AAGG AGACAG AG AAG AATGTG AAAAG ATGTGTAG - TC 
850 860 870 880 890 900 

870 880 890 900 910 920 

CTGCATTT - CTGGCCCGGG AC AAGCC AAACCCAGATGCTG AGTCCTACACTG - -CTATCA 

CTGAGTTAACAGTT - -TGACAAGAGCCTAAGCATGCCCTTCAGGCAACACGTACCTCTGC 
9L0 920 930 940 950 960 
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930 940 950 960 970 980 

AAATAGCCGAAGCCAATAAGCTGAAGCTAACCCCTGAATATCTGCAGCTGATGAAGTA^ 
:: :: ... ::::: : :: .: . . 

GAGAAGGAGGAGGCA GCCATTTCTAACTC GTTTC T ATAG AAGCCCTGGGTAG 

970 980 990 1000 1010 

990 1000 1010 1020 1030 1040 

AGGCCATTGCTTCC AACAGCAAGATTTACTTTGGC AAAG ACA - TTCCTAACATGTTCATG 

; , ; ; . . . . • ... • ......... ... ... ..... . . .« ... 

ATGCCTCAGCA - - CGGTGCCTTTTCATGCTTTG ATTGACACTCAACCT - -CGGGAGGAAA 
1020 1030 1040 1050 1060 1070 

1050 1060 1070 1080 1090 1100 

GACTCTGCGGGCAGTGTGAGCAAGCAGTTTGA 

CCCTCTGCA— C- - -GTGACCTGTCAATATG- -GTGCTAAATGT- -GTCTATG GAC 

1080 1090 1100 1110 1120 

1110 1120 1130 1140 1150 

TTAG AAGATG AAC - CCTTGGAGA -CGGCC ACTAAGGAGAATTGAAAAAAACTTGAT 

m # . . • • • ... • •«• » ... ............ 

CCTGCTCTCCGTCTCCAGGCAGTTCTACCGTATACTTGGACCCTTGGGTTATAGCTAGCC 
1130 1140 1150 1160 1170 1180 

1160 1170 1180 1190 1200 1210 

ATG ACTGC AAATG AT ACT - T AAGC AG ATCTTTATTTTTT AAG ATGAATCAG AATGTTCCT 

1190 1200 1210 1220 1230 

1220 1230 1240 1250 1260 1270 

CCCTCCCCGACTACCTTCTCTGACTGTCTTCCAGTTACTC 

..... .. ..... . .... • . . .. .* 

; z i . • ••• • ■ ......... ..... •«. ... .... .... 

CGCTACGC - -CTG - -TGC -CACGCAAAC - -CCTGTGCCTA - -GAACATAGCCTGGACGTC 

1240 1250 1260 1270 1280 

1280 1290 1300 1310 1320 1330 

ACTTAAATCCACTCCCTTTCTAGGGAAAGGAGGGTGGGGACTGATGATGGGGGGTTTTAT 
: : . . : : . : : ; : : : : : . . : : . . : . : : : ; . . . 

ACAGCTACTCTGTACATTTCT GCTTGGTTCATTCC - TCTGTAGTTGCACGGCTTAG A 

1290 1300 1310 1320 1330 1340 

1340 1350 1360 1370 1380 1390 

TTCAGGTAAGCAGTTTATATGACTTCCAATAAGATTTGTAAATCATC 

..#« ••«. .. «• ..... ....... 

; ••• ...... • . •■■ .... ..... .. .... .««•• .. . 

T- -GGAGAAACAAGAGTCTAACCTTCTCATGGTCCCAGTTT -TC - TCG ATTAG AC - TTCG 

1350 1360 1370 U80 1390 

1100 1410 1120 1430 1440 1450 

ACCTCTAGACACTAATTTTATCCTTTGA -GGCTGGCTTAATTAG - -GGATCCTGTCAT-T 

A - - TC AATATTCTTCTAA - ATCCTCTG AC AAATG ATCTAATT AG AAG AAATC AG ACCTCT 
1400 1410 1420 1430 1440 1450 

1460 1470 1440 1490 1500 1510 

AAGCAiiAGGGAGAAATGTAGAGTGTTACCTCCAACTCATTTGATTTCCCTTACTTGCGAA 
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TTCCTGTGTGCATTGCTGGGACAAATGCCTC CATTAGAAA ATTCAAAGAAA 

1460 1470 1480 1490 1500 

1520 1530 1540 1550 1560 

AATGCAGTCCAGTGTTCTCACCTCTG- - CCTCCAAGGTAGGAG ATGTCTGTGGGTGAGGC 

GTCATAATCGAGAAT - CTCTTTGGTGGTCCTCTAAGGCGGGT - -TGTTTTTCAATGTTGT 
1510 1520 1530 1540 1550 1560 

1570 1580 1590 1600 1610 1620 

TYWKCAACTGAGCAAATATGTGCCTGTGAGTTTGCCAGTAGAGCTGTC 

TG-TCTT -GGAGCTTGGAGGTGAAATTCAATGT TTAAAATTTTTAGGAAATTTATA 

1570 1580 1590 1600 1610 

1630 1640 1650 1660 1670 1680 

CAGAGAA - CATTTGACCTTCCTGGCATTCTTGTCTGCATGTGTC 

C^AAGAAACrTTTAAATAAAGTATATTGAATGT -GCCATGAAAAAAAAAAAAAAAAAAGG 
1620 1630 1640 1650 1660 1670 

-J 

1690 1700 1710 1720 1730 1740 

TGTGCTTTCTTGAGCCCTCATAAGGAAGTACTGGTG 

GCGGCCG 
1680 
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fcjjc 10 20 30 

X: ::::::::::::::::::::::::::: : 
U TGTTCAGAC^CACTACAAACTGATGAAGTTAAAAATG 

240 250 260 270 280 290 

40 50 60 70 80 90 

AGTCATGATCTATATTGACCGAATAGAAGTGGTTAATATGTTGGCTCCTTATGCAGTGTT 



GGTCATGATCTATATTGACCGAATAGAAGTGGTTAATATGTTGGCTCCTTATGCAGTGTT 
300 310 320 330 340 350 

100 110 120 130 140 150 

TGACATTGTGAGGAACTATACTGCAGACTACGACAAGACTTTAATCTTCAATAAAATCCA 



TGATATCGTGAGGAACTATACTGCACATTATGACAAGACCTTAATCTTCAATAAAATCCA 
360 370 380 390 400 410 

160 170 180 190 200 210 

CCATGACCTGAACCAGTTTrcCAGTGCCCACACACTTCAAGAAGTTTACATAGAATTCTT 



CCATGAGCTGAACCAGTTCTCCAGTGCCCACACACTTCAGGAAGTTTACATTGAATTGTT 
420 430 440 450 460 470 

220 230 240 250 260 270 

TGATCAAATAGATGAAAACCTGAAGCAGGCCCTGCAAAAAGATTTAAACACCATGCCCCC 

■ ••••••••••••••••••••••■••« • • *«••■ ••••••••• 

••••••••••••••••••■•••••■•■••a 

TCATCAAATAGATGAAAACCTGAACCAAGCTCTGCAGAAAGACTTAAACCTCATGGCCCC 
480 490 500 510 520 530 

280 290 300 310 320 330 

ACCTCTCACTATCCAGCCTGTCCGTGTTACAAAACCCAAAATCCCAGAACCCATAAGAAG 



AGGTCTCACTATACAGGCTCTCCGTGTTACAAAACCCAAAATCCCAGAAGCCATAAGAAG 
540 550 560 570 580 590 

340 350 360 370 330 390 

AAATTTTCAATTAATGCAGCCAGAGAAGACAAAACTTCTCATAGCTGCACACAAACAAAA 
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AAATTTTGAGTTAATGGAGCCT 

600 610 620 630 640 650 

400 410 420 430 440 450 

GGTGGTGGAGAAAGAAGCTGAGACGGAGAGGAAAAGGGCTGTTATAGAAGCAGAGAAGAT 

■ a a. ■ a ■ a a ••••■•■•a*a*«* aaaaaaaaa* a • a aaaaaaa>»a***aaaaaaaa 
a a • a •■•••••••»••••••••■ ••••••••■•••••■••••••■••••••••••••»a 

GGTTGTGGAAAAAGAAGCTGAGACAGAGAGGAAAAAGGCAGTTATAGAAGCAGAGAAGAT 
660 670 680 690 700 710 

460 470 480 490 500 510 

TGCACAAGTAGCAAAAATTCGATTTCAACAGAAAGTGATGGAG 



TGCACAAGTGGCAAAAATTCGGTTTC^GCAGAAAGTGATGGAAAAAGAAACTC 
720 730 740 750 760 770 

520 530 540 550 560 570 

CATTTCTGAGATTGAAGATGCTGCGTTCCTGGCCCGAGAGAAGGCA 



CVTTTCTGAAATCGAAGATGCTGCATTCCT^^ 

780 790 800 810 820 830 

580 590 600 610 620 630 

GTATTACGCTGCACACAAATACGCCACCTCAAACAAGCACAAACTGACCCCAGAGTATCT 



ATATTATGCTGCACACAAATATGCCACCTCAAACAAGCACAAGTTGACCCCGGAATATCT 
840 850 860 870 880 890 

640 650 660 670 630 690 

GGAGCTCAAGAAATACCAGGCCATTGCCTCAAACAGTAAGATCTACTTTGGCAGCAACAT 



GGAGCTCAAAAAGTACCAGGCCATTGCTrCTAACAGTAAGATCTATTTTGGCAGCAACAT 
900 910 920 930 940 950 

700 710 720 730 740 750 

CCCCAGCAT G T rr GTGGACTCCTCCrGTGCTCrGAAATACTCTGATGGTAGGACTGGGAG 



CCCTAACATGTTCGTGGACrCCTCATGTGCTTTGAAATATTCAGATATTAGGAC 
960 970 980 990 1000 1010 

760 770 780 790 800 810 

AGAACACTCCClTCCCCCAGArcAGGCCa^ 



AGAAAGCTCACTCCCCTCTAAGGAGGCTCTT G AACCCTCTGGAGAGAACCTCATCCAAAA 
1020 i030 1040 1050 1060 X070 

820 830 840 850 860 870 

CAAGGAGAACGCAGGTrcATGCAAGAGGTGGAAATGTTCTCCCATATCAAGATGCCACCC 



CA^VAGAGAOCACAGGTTGATGCAAGACCTCGAAATGTTCTCC -ATATCAAGA TCTCGCCC 
tOSO 1090 1100 1110 1120 1130 

3*0 890 900 910 920 9J0 

A^GGGGCTAAGTGGGAACAGTGCTTATGTGGACTCCTAACATTCACAGACAATGTGTGCT 



A.AGCCCTTAAGTGCGAACAATCATTATACGCACTC^ 

1UJ 1150 1160 U70 1180 119J 
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940 950 960 970 980 
CTGTTGTGATTCTCTTGTCATAGTCCTGGTTTGCCAG 

TCATCTGTTCCACCTCTCCTGCGATAGTCCTGGGTGC^ 

1200 1210 1220 1230 1240 1250 

990 1000 1010 1020 1030 1040 

AGCTGTCTGGCACIXIAAACGGT^ 

• ••••••«* m m w m • • • . . » . » * •••■«■«•■■ ■ • • B • ■ 

»•••«•*■•«■»•*•••• ••••• • ........a............. 

AGCTGTCTGACACACAAATGGTCTTTTC^GCCA^ 

1260 1270 1280 1290 1300 1310 

1050 1060 1070 1080 1090 1100 

TCCTTTGTAAACCGGTACTCATGAATGAGGGAAAGTCTGATG 



TCCTTTCTAAACTGCTACTCATG AATGAGG - AAAGTCTG ATGCTAAG ATACTGCCTGCA - 
1320 1330 1340 1350 1360 1370 

1110 1120 1130 1140 1150 1160 

TGGAATGTCAAACACTATATAACAAGCTGTGGTTTTTAA 



1170 1180 1190 1200 1210 1220 

\TGTGTCCTCAGACATTCAACACCTACGAGGCCAGAGACAAGAC 



TTCCCTG- 



1230 1240 1250 1260 1270 1280 

CTTCAGAAAACGGTAAGTTAAAGAAGACAAGTGTCATCAGACACTTGGGACCCGGGCTCT 
"••••*•• **•*•• ....... • ■ 

CATTGGGTT - - -GATG AC — TGTCAGC A TC A 

1380 1390 1400 

1290 1300 1310 1320 1330 1340 

CTTTAAACTCTAGTCCCCGCATTCCTCCATGTGATTGACACCCA 

• • - • « • . ■ • . 

CTG CCG-- CAGCCCA 

1410 

1350 1360 1370 1380 1390 1400 

GGAAATTATCTTCCAGTTGAATGACCATTTACTTGATACAAATTGTACCTTTC 

• » ♦ • . • . . a •••••• 

*...«* 

- — TGCTTG - - ACTAAG -GTACCT 

1420 1430 

ULO 1420 1430 1440 1450 1460 

CTACTCAGGTTCGTGCCCTGCAGGGACGCGTACTTTCCCACCCGACCAGAGGT^ 

OGTT TTAGCCA - -CACCCA CCTC - - 

1410 1450 

1470 14-50 L490 1500 1510 1520 

AGATArTCCCAATCACTAGTTTATTGCGTTAGGAGACTCACAGATATAGAAAGCACCTGA 
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CTTGTAT 

1460 

1530 1540 1550 1560 1570 1580 

AATTTAAGGGAGATAAAGCCTGCACTGCACCAAAGCTACGGGTCCCTGTGTTTCCTCTAT 



GTTACCT T 

1470 

1590 1600 1610 1620 1630 1640 

TCAGTGATGTCATCAACCTCACTGTCCCAGCCCATGTGTGACTAAAGTGCCCGGTTTTAG 
• • ■ • •••••• * • • • 

■ «•> •*•*•••. • • • • a 

TCAC CTCTGGCC AAGAG 

1430 

1650 1660 1670 1680 1690 1700 

CCACAGACAAClXKnTAGATGTCACCTCTTGGCTGACCAAAGCTGGGACAGCK3CTTTAAC 



TGGGACAGGGTTTTAAC 

1490 1500 

1710 1720 1730 1740 1750 1760 

CAGACATAGGAGCAGTGTGCAATTCCTGAT-TCA - -CTGCACAGTATTATGTCATAATTG 



CACAAATAGGAGCAGCATGCAATTCCTAGTGACTTGCTGCACAGTATTGTATCATAATTA 
1510 1520 1530 1540 1550 1560 

1770 1780 1790 1800 1810 1820 

CAGGAATTATTTTTTGTTTTTAAAACTGGATTTGGGGCACA 



CAGGAA GTTTTTATTTTTAAAACTGGATCTGGGGTATATTCATTTGCCCCATCACCT 

1570 1580 1590 1600 1610 1620 

1830 1840 1850 1860 1870 1S30 

ctatctaaaggccaaggttctagcxk:tgctatggtcactaacacactgattctccttaaa 



CTGTCTAAAGGCCCAAGTCCTAGGGCTGCCATGGTCACAAGCACACTCATGCTCCTTAAG 
1630 1640 1650 1660 1670 1630 

1300 1900 1910 1920 1930 

CTAATT CTCCAAGTCTCCAACAAACTG - - ACCGAG ACAGCATCCTCAGT 



ATTGTTTATCTGGACCCCACATAGTCTGGAACAAAAAGTCACCTAGAAAGCATCCTTGGT 
1690 1700 1710 1720 1730 1740 

1940 1950 1960 1970 1990 

CATCTTTGTCTCCTTCCCT GGGATCCAGATACCGAAGTTCCTTTTCCAAC7 



CATCATTUTCnCTTCrc 

L7S0 1760 L770 1780 1790 1800 

l?'»J 2000 20L0 2020 2030 2040 

TTCGCCTCCCCTAi7GAGATCAGAAAGAA'rTCTTGTGACTTCC'rGGGCAGCCATTX3AATTC 



iJTCACCTt:CCCCAG«;AGAn:A(XjA 'TTCCACTGACGTCCTGGGCAGCCACTGAATTT 

HI) 1*20 1*30 1840 1*50 
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2050 2060 2070 2080 2090 2100 

A-TTTTCCATGAGAAGATGACAGAGTTAGCCTCTGGCTATAGGAGATCAT-GTCATCCAG 



AATTTTCCATG AGAA - ACAACAGAGTTAACCTGTGGC ATTAGGAGACCTACTTCATGTGG 
1360 1870 1880 1890 1900 1910 

2110 2120 2130 2140 2150 

ACC -TTTTTGCCCATCACATTAACTTTCCTGGAAT 

ACCCTTTTTTTCCT7CAGTTTAACTTTO 
1920 1930 1940 1950 1960 1970 

2150 2170 2180 2190 2200 2210 

TCTGCCCAGCTTGTT- -GACAGCTCTTGTGTATACTGTGTTGAAGCCAGACAGAAAAGTA 



TTTGTGCAGCTTGTTAAGACAACTCrrcTGTACACTATGTTGAAGCTC 
1980 1990 2000 2010 2020 2030 

2220 2230 2240 2250 2260 
ATGGGGCCACTTCT-GAAACCTCTCAGCTGT TGA TCTCACAGCAGCTAAAG 



ATGGGACCACTTCTAGAAATCTTTCAGCTGTCAGGCCTC 
2040 2050 2060 2070 2080 2090 

2270 2280 2290 2300 2310 2320 

GGTTGTGCCAAACA -TTTTATTAAGAAAGTAAAGCCCAGATTTC 

GGTTGTGCCAAACACTTTATTTGGGAAAGGAAAGCCCAGATTTC 
2100 2110 2120 2130 2140 2150 

2330 2340 2350 2360 2370 

AGGCCTTATAGTATAGAGGCATTTGTAATATGGAGAAAATAATTTTTC TCAT 



GGGCCTTATCCTATAGAGGCATTTGTAATATGGAGAAAATAATTTTTCA 
2160 2170 2180 2190 2200 2210 

23S0 2390 2400 2410 2420 2430 

TTAATTATAGAAATTACCTTCAAACA- -GATTTTGTCTTCTTTCG- -C -CCTTCAAA-TA 



TTAA r n*CTATAAATrCl'CTrTATAAATGAATrriX3TGTrCTrTAGTTCTCCTTAAAAGAA 
2220 2230 2240 2250 2260 2270 

2440 2450 2460 2470 
CTGGTGTTACATTGTTG CTC-CAGATAAATG ATGATTGTCGT 



CITTTCAATTATAAAAATAAAATCrrTACCTGTCGAATTGT^ 
22*0 2290 2300 2310 2320 2330 

24*0 2490 2500 2510 2520 2530 

GGGA TA TCTGCATCACTGAGCTCTGTGCrTTCATTCCTAGACATG 

GG AAAATCTGGATCATTGACCTCTGTGCTTTCATTCCT^ 
2340 2350 2360 2370 2380 2390 

2540 2550 2560 2570 2530 2590 

TAiTTGAAATCCrcTTGCCCCAAAGTW^ 
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-AGCAAAA-GCTGTTGCCCCAAAGTGATGGCCCTGGAGG CGG GGC- - - 

2400 2410 2420 2430 2440 

2600 2610 2620 2630 2640 2650 

GGTGAGGAGCAGGG AAGCGCCATTGTGAAAGATTAAAGAAAGCACTTCCACTTGAGCTCC 

::::: ::::::: 

— TGAGGAACAGCGAAATGCCGCTGTGAAGTCTTAAA GCACTTCTGCTTAAACTCC 

2450 2460 2470 2480 2490 

2660 2670 2680 2690 2700 

TTATG GAGTGAGCTTCCCTGTGCCCACTCAGTGAACTAAGTCTGACCATCCTTCAG 

ATGTGTGAGGAGTGTGCCTCCCTGTGCCCTCTCAGC — TCTGAGGCTGGCCGTCTTTCGG 
2500 2510 2520 2530 2540 2550 

2710 2720 2730 2740 2750 2760 

GGACGTTCCTTTTGGTAAATATACACTGTAATCTTTAA 

GGT - GTTCCTTTTGGCAAAT ATACACTGTAATCTT -GAGTCTAAATTTATATGTTGAAAT 
2560 2570 2580 2590 2600 2610 

2770 2780 2790 2800 2810 2820 

- -TAACTTTTTT TAAAAACCTAAATAAAATTATTTTCCTATCAAAAAAAAAAAAAA 

GCTACCTTTTTTAAAATAAGAAACTAAATAAAATC 

2620 2630 2640 2650 2660 2670 

2830 
AAAGGGCGGCC 

v 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
2680 2690 2700 
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10 20 30 40 50 

HoMA»J GTCGACCCACGCGTCCGGCGGGGACAACTGGGTCTTTTGCGGCTGCAGC- 



Kuiii Me GTCGACCCACGCGTCCGGC CTGCTGA-TCAGTGGCGGCTGCGGCTGAGCTTGCAG 

10 20 30 40 50 

60 70 80 90 100 110 

GTGTCCGGCTTTCCTGGCCCAGCAAGCCTGATAAGCATGAAG 

GCATCTAGTCTTGCTGGCTCAGCAAGCCCGATAAGCATGAAGCTGCTGTC 

60 70 80 90 100 110 

120 130 140 150 160 170 

GTGGTCGGGTGTTTGCTGGTGCCCCCAGCTGAAGCC 

GTGGTGGGGTGCTTGCTGGTGCCCCCAGCTCAAGCCAAC 

120 130 140 150 160 170 

180 190 200 210 220 230 

TGCAAATGCATCTGTCCACCTTATAGAAACATCAGTGGGCACATTTACAACCAGAATGTA 



TGCAAATGCATCTGTCCGCCTTACAGAAACATCAGCGGGCACATTTACAACCAGAATGTG 
180 190 200 210 220 230 

240 250 260 270 280 290 

TCCCAGAAGGACTCCAACTGCCTGCACGTGGTGGAGCCCATGCCAGTGCCTGGCCATGAC 



TCTCAGAAGGACTGCAACTGCCTGCATGTGGTGGAGCCCATGCCAGTGCCTGGCCACGAT 
240 250 260 270 280 290 

300 310 320 330 340 350 

GTGGAGGCCTACTGCCTGCTGTGCGAGTGCAGGTACGAGGAGCGCAGCACCACCACCATC 



GTCGAAGCCTACTGCCTCCTCTGCGAGTGTAGGTACGAGGAGCGTAGCACCACAACCATC 
300 310 320 330 340 350 

360 370 380 390 400 410 

AAGGTCATCATTCTCATCTACCTGTCCGTGGTGGGTCCCCTCTTCCTCT 



AAGGTCATTATTGTCATCTACC'rCTCTGTGGTCGGGGCCCTCTTACTCTACATC 
J60 J70 380 390 400 410 

420 430 440 450 460 470 

CTGATGCTGGTGGACCCTCTGATCCCAAAGCCGGATGCATACACTGAGCAACTGCACAAT 



CTGATCCTGGTGGACCCGCTCATCQ^GAACJCCAGATGCCTATACTGAGCAGCTGCACAAT 
420 4J0 440 450 460 470 

4*0 -190 500 510 520 5)0 

GAoOA^HIAGAATCAGGA'rccrCiICftrrATC 
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GAAGAGGAGAATGAGGATGCTCGCACCATGGCAACAGCCGCTGCGTCCATTGGAGGACCC 
480 490 500 510 520 530 

540 550 560 570 580 590 

CGAGCAAACACAGTCCTGGAGCGTGTGGAAGGTGCCCAGCAGCGGTGGAAGCTGCAGGTG 

540 550 560 570 580 590 

600 610 620 630 640 650 

CAGGAGCAGCGGAAGACAGTCTTCGATCGGCACAAGATGCTCAGCTAGATGGGCTGGTGT 

600 610 620 630 640 650 

660 670 680 690 700 710 

GGTTGGGTCAAGGCCCCAACACCATGGCTGCCAGCTTCCAGGCTGGACAAAGCAGGGGGC 



G ATTGCATCAG AGACCTGG -GCCATGGCTACCAGCTTCTGGG GCT C 

660 670 680 690 

720 730 740 750 760 770 

TACTTCTCCCTTCCCTCGGTTCCAGTCTTCCCTTTAAA 



- ACTGCAGTCTTCCCT -GG GTCTTCCCTTCAAATGCCCATGGCGTTTATCC T 

700 710 720 730 740 

780 790 800 810 820 830 

TCTCCCTAACTTTAGAAATGTTGTACTTGGCTATTTTGATTAGGGAAGAGGGATGTGGTC 



TCTCCCT- - CTCTAG AAATGT ACTCGACTGTTATAACGAGGGA -GTGTG ATTGGGTC 

750 760 770 780 790 800 

840 850 860 870 880 890 

TCTGATCTCCGTTGTCTTCTTGGCTCTTTGGGGTTGAA 



TCTGTA GGTCT CTCGGCGGTAGAGGGGAGGGG - AGGGAAGGC - AGA 

810 820 830 840 

900 910 920 930 940 950 

ACGGAATCGAGACATTCGAGGCGCCCTCAGGAGTGGATCCGATCTGTCTCTCCTCGCTCC 



AGGGAACAGAGACATTTGAGGTGGCCACATGATTGGGTGGAATTCATCCCTCCTGTCTTC 
650 860 870 880 890 900 

960 970 980 990 1000 10L0 

ACTCTTGCCGCCrTCCAGCTCTCAGTCTTGCGAATGTTC 



AC -CATTCCTC - - -CCAGCTCCACATCTTAAGGATCC - -TTAC GOGAGACCAAGCT 

910 920 9J0 940 950 

1020 10)0 1040 1050 1060 1070 

GCGTCTTCAGGAACTCAGTGTCTGG(TAOGAAAGCATGGCCCAGCATTCAGCA'rCTGTTCC 



GTCTCATCAAGAGCTCAGTOOOTCCGAwGAAAGTATGATCCAGCG^ 
960 9/i> 980 990 I0t)0 10 10 
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1080 1090 1100 1110 1120 1130 

TTTCTGCAGTGGTTCTTTATCACCACCTC 

. . . a • • • a • a a » • a ■ ■ • • *"! * * * • * • • • _ • • • 

. * • • a • ■ • a • a • ■ • • • • • • • • • • a • a aaaa 

AGGATGCTGTGGTCCCCATTC - CCAGTTCCTT — CAGTGCCAGTACTTTAACTT - GGCC - 
1020 1030 1040 1050 1060 1070 

1140 1150 1160 1170 1180 1190 

CAGCTCCAGCCCTGAGGACAGCTCTGATGGGAGAGCTGGGC^^ 

-TACCCCAGTC -TCAGGA ACTGTTG TGGTGCCCCTGAGCCCACAGTCAT 

1080 1090 1100 1110 

1200 1210 1220 1230 1240 1250 

CTTCAGGGTGCAC - TGGAAGCTGGTGTTCGCTGTCCCCTGTGCACTTCTC 

. - a ■ • ■ • • • ■ ••»•■«• -a • « - » aaa • • - • - a « a a « • 

■ • • * a ....... . . . . a • • » - a a a ••• a a a a 

CTCCAGAGTCCACCTGGAAGCCTGT -TCCCCTCTCCTCGGCTC -CTGGTC -CACCAGTGC 
1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 

ATGG - AGTGCCCATGCATAC TCTGCTGC— CGGTCCCCT- -CACC-TGCACTTGA 

. • • • ...•*•••.••*• * 9 * a aa • aaa aa. a aa a a aa 

a a a a a a a • • a a a. a a. a a.aa a a a. a. 

ATGGCAGTGCCCATGCATGCCGGCATATTCAGCAGCTCTC 
1180 1190 1200 1210 1220 1230 

1310 1320 1330 1340 1350 1360 

; GCXXmrrcXXX^GTCCXTCCTC^ 

aa aa. aaaa m»*m»»»** aaa a aaa aaaaaaaa a aaa 

a. » » » w » ..aa aaaaaaaaa a aaaa 

GGCCGTAAGGCC -TCCCACCTCTCCCCTGTGACTGCAGCTGCTGAGCCATAA AGTT 

1240 1250 1260 1270 1280 1290 

1370 1380 1390 1400 1410 1420 

GGAACATGAGACTCGAGGCTGAGCGTGGATCTGAACACCACAGCCCCTGTACTTGGGT^ 

a.a aa* aa. . .aaa a a aaa a a. aaaa aa a a a a a aaaa* a a 

.aa aa.aaaaaaaaaaaa a. a aaa a .aa .aaa aaa » » w a » aaaaaaa a 

GGACCATATGACACAAGCCCAAT-GGGGACCGGAGTACCATGGCTCCTGTCCTTGGATGG 
1300 1310 1320 1330 1340 

1430 1440 1450 1460 1470 1480 

ccrcTTGTcccrroAAcrrcG'r ro ^ 

TCTCTTGTCCCTCAATTTCATTGTATCA -TCCATCCAGAGAAAAA<\AAAAAAAAAAAAAA 
1350 1360 1370 1380 1390 1400 

1490 1500 1510 1520 1530 1540 

TAGAGTTGTGTGTAAATCAAGGAAGCCATCATTAAATTGTTTrA 



AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^^ 
1410 L420 14)0 1440 L4S0 1460 

1550 L560 
AAAAAAAAAA- - GGGCCGCCG 



AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGCCC- - 
1470 14*0 1490 1500 1510 
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10 20 30 40 50 60 

GCACGAGTCCAGACGGAAGTGCGGGCGGAGGATCCCCAGCCGGGTCCCAAGCCTGTGCCT 



G-TCGA- 



-CCCA — CGCGTCC- 
10 



70 80 90 100 110 120 

GAGCCTGAGCCTGAGCCTGAGCCTGAGCCCGAGCCGGGAGCCGGTCGCGGGGGCTCCGGG 



-GGGC GC-GGGGCTCG GGGC- 

20 30 



-TCGCAGGAGC GG 

40 



130 140 150 160 170 

CTGTGGGACCGCTGGGCCCCCAGCGATGGCGACCCTGTGG - - -GGAGGCCTTCTTCGGCT 



CT- 



-GGCTCCC-GCGATGGCGACCCTATGGTGCGGAAACCTGCTGCGGCT 
50 60 70 80 90 



180 190 200 210 220 230 

• •II! • 2 S I " Z i I • * • •••SIISIISI I! IIIIIIIIIII IIISSI*!"*! 

GGGCTCGGGGCTCAGCATGTCCrcCCTGGCGCTGTCGGTGCTC 

100 110 120 130 140 150 

240 250 260 270 280 290 

AGACGCCGCCAAGAATTTCGAGGATGTCAGATGTAAATGTATCTGCCCTCCCTATAAAGA 

• • ••••••••••••••• • • ••••• •■••••••••• •••••••■•■••••••■_•. 

••••••••••■••••••a •••••••• •••*••■•■•• ••••iiiii.iiiiiiiiii 

AGGCGCCGCCAAGAATTTTGAAGATGTGAGATGTAAATGCATCTGCCCTCCCTATAAAGA 
160 170 180 190 200 210 

300 310 320 330 340 350 

AAATTCTTCGCATATTTATAATAAGAACATATCTC 
••• ....... »••■•••••••••• •••••••••••••••»••.•.••••..,«,.« 

GAATCCTGGGCACATTTATAATAAGAATATATCTCAGAAAGATTGTGATTGCCTTC 
220 230 240 250 260 270 

360 370 380 390 400 410 

TGTGGAGCCCATGCCTGTGCGGGGGCCTGATGTAGAAGCATACTGTCTACGCTGTGAATC 

CGTGGAGCCCATGCCTGTACGGGGACCTGATGTAGAAGCATACTGTCTACGCTCTGAATC 
280 290 300 310 320 330 

420 430 440 450 460 470 

CAAATATGAAGAAAGAAGCTCTGTCACAATCAAGGTTACCATTATAATTTATCTCTC 

CAAATACGAAGAGAGAAGCTCTCTCACAATCAAGCTTACCATTATAATTTATCT 

340 350 360 370 380 390 

480 490 500 510 520 530 

TTTGGGCCTTCTACTTCTGTACATGGTATATCTTACTCTG 
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TTTGGGCCTTCTGCTTCTGTACATGGTATATCTTAC 

400 410 420 430 440 450 

540 550 560 570 580 590 

GCGC CTCTTTGGACATGCACAGTTGATACAGAGTGATGATGATATTGGGGA 
............... . >...*«■• ........ 

GCGCCTCTTTGGACACTCCCAGCTGTTGCAGAGCGATGA 

460 470 480 490 500 510 

600 610 620 630 640 650 
TTTTGCAAATGCACACGATGTGCTAGCCCGCTCCCGCAGTC 

«...»•»•-«•- « « .».....• ........ ..... ........ . . . . ..... 

............ . , •«.....••.«...... ..... z i l I » I ; i j ; 

TTTTGCAAATGCCCATGATGIXKTTGGCCCGCTCTCGC 

520 530 540 550 560 570 

660 670 680 690 700 710 

GGTAGAATATGCAGAGCAGCGCTGGAAGCTTCAAGTC 

• •• •• •• • • ••••• .«...«......••••.••• 

• ••••••••••••••••••• ............ ........... 

CKrTGGAGTACGCTCAGCAGCGCTGGAAGCTCaVGGTCCATC 

580 590 600 610 620 630 

720 730 740 750 760 770 

TGACCGGCATGTTGTCCTCACXrrAATTGGGAA 

::::::::::::::: :::::: ; : 
CGACCGACACGTTGTCCTCAGCTAACTGGGAACTGGAATCA-GGTGACTAGGAAGAA-CA 
640 650 660 670 680 690 

780 790 800 810 820 830 

CGCAGACAACTGGAAAGAACTGACTGGGTTTTGCTCGGTTTC 

CGCAGACAACTGGGAAGAATTGTCTGGGTGT — CCGTG CGTTTTAATGCCATGTTTG 

700 710 720 730 740 

840 850 860 870 880 

TTT CA CCAA-CTG -TTGCTCCAAGATTCAAAACTGGAAGCAAAAAC-TTGCTTC 

TTTTTACAAATCCTTGCTCGATGGAGGAAGACTCCAAACTGGAAG 
750 760 770 780 790 800 

890 900 910 920 930 940 

ATTTTTTTTTCTTGTTAACGTAATAATAGAGA 

GTATTTT CCTGTTAATATATTAATAGAGACATTTTTACA-GCACACAGTTCCAAGTC 

810 820 830 840 850 860 

950 960 970 980 990 1000 

AGCCAr>TAAGTCTTTTCCTATTTGTGACTTT^ 

AACCAGTAACTCTTTTCCT ACTTGTG ACTTTT ACT AAT AAAA'ITAAC - CTG CCTGTG AGT 
370 880 890 900 910 920 

1010 1020 10J0 1040 1050 1060 

TA rrTDGAAGTCCTTTACCTCCAAO -TCCT- -CTTT 

TATirTT«;AAt7CCCCGTGCCn:CAACAAGCTCrCTCTrrCTTCCCA 

9)0 940 950 960 970 980 
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1070 1080 1090 1100 1110 

-TCCTCATGGAAATGTC TGC -TTTATGAAACT - ATGCACATATTGAAAGTGAGTTG 

• • • • • mm mm mm m * * « « * # a • • * * • • • * « 

• * • a « *••••« * * * • • • • • • * • 

GTGTTCAAGATAACTTCCAGGI^TGTITTIXJCTTC^ 

990 1000 1010 1020 1030 1040 

1120 1130 1140 1150 
AAA CAAATGAGGG - TTGGGTAG GAG-CTT- -CCAGGC CTGGGA 

GAAGGATGCCTTGGGAGTGCTTGAGTAGCTTCTCAAG^ 

1050 1060 1070 1080 1090 1100 

1160 1170 1180 1190 1200 
TTTACACCACGCCTA — GCCCAGCAGAGGCCTTAGTCCCATT-TGG- -GCCTT TGGG 

• • • • " - • • ' mm • • • • • • m • 

• ••••• » * • • » • . m m m m m m * • a J J 

AATACTTCAGACCCTCTACTTCACACTTGTTAATGTC 

1110 1120 1130 1140 1150 1160 

1210 1220 1230 1240 
AG TGACATTTGCT- TGA -GGCTTATACA CTGGT G 

TGCTGGCCTCCCCACTTGACTTTTGCACTGACTACATTAC 

1170 1180 1190 1200 1210 1220 

1250 1260 1270 
TGGTTGCCTGGCTTG — CAG — -GAAATGA CCAAG CTCACA 



TGGCTGCATTTCATCACCAGTTGGATCTGAAATGCCTGGGGGCTCCT 

1230 1240 1250 1260 1270 1280 

1280 1290 1300 1310 1320 
CATGC TCGCTGAAGCGT- AAGMR - KACAACTG AGGTACTCTTTTGA 



TTTGTTTCATGCACTGTGATGTCTGACGCAACATGTTC 

1290 1300 1310 1320 1330 1340 

1330 1340 1350 1360 
AGGATG AAGGTGGTG — CATTCTCACCC - CTGGG GCTCTTCCTCA -C 



TAGTTTACACTCATACCrAAACACAGTCTCAGTGTGTGTGGTCT^ 

1350 1360 1370 1380 1390 1400 

1370 1380 
CTGACCAC C CTT CAGAGCCACCC 



GTAGCTCTAAGGACTTGAACATrTAGAATAAAGACATTTTCTCT^ 

I'liO 1420 1430 M40 1450 1460 

1390 ^ 1400 ^ 1410 1420 L430 



TGC ATGAITGACGTACAAATACTG AT - CAOCCTTTTCTCTCTTGCTC ACACGC AGTTCTT 
L470 * L480 1490 1500 1510 

1440 1450 L460 1470 L-HO 

TCATCCCTT -TGCTTTr; - - -OGATCT CAGGAATACAGT- - -CC -CATCCAAACAT - 
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TGAACTGATGTGGGCAGCTTTGAACAAGGACTAGAGTTCAGATTC 
1520 1530 1540 1550 1560 1570 

1490 1500 1510 1520 1530 1540 

TCTCTGGTTTTATGGCTTTTTTCCCTTTCT - TTACACCATCCTCTCCCATAAGCACCCAT 
::: . . :.. : : ::: : :: :::::::: : : :: 

TCTAACAGTTATTGGATAACTGGCTTTTTTCTTCCT 
1580 1590 1600 1610 1620 1630 

1550 1560 1570 
GTCTTTGAATATGAATGTATTTGTAAAATAAAAAA 



AAAATAATTTACAAAACCCAAAAAAAAAAAAAAGGGCGGCCG 
1640 1650 1660 1670 1680 



rlG Jo C 1 * or*) 
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10 20 30 40 50 
k> GTCGACCCACGCGTCCG - - -CTCTGAGTCACCGGAATCTAGGTGGGGC CGCC-CG 

• •••••■•••••••»«* a • • . a ■ • a a a a _ 

• •••••••»•••»•••• a • a a a * a a 

H UfcWe G7CGACCCACGCGTCCGGCGCTCTGAGTCACCGGM 

10 20 30 40 50 60 

60 70 80 90 100 

GAGCGGCGTCCT- - -CGGGAGCCGCCTCCCCG CGGCCTCTTCGCTTTTGTGGCG 

• S S I J • !!!!•!•!!! ! I • • I ! S ! « t ! ! • • S 

CCCCCGCCGCCAGCCCGGGGGCCGCCrrcTTCCXXX^ - CTTTAGTCGCG 

70 80 90 100 110 



110 120 130 140 150 160 

GCGCCCGCGCTCGCAGG - CCACTCTCTGCTGTCGC - CCGTCCCGCGCGCTCCTCCGACCC 

• • • a* a aa aa * » • • a a a a a a a a a a 

a a a •»*•••••»•» • . a • a • • aa a aa a* • » » * aaa a • a a • • • 

120 130 140 150 160 170 



170 180 190 200 210 220 

GCTCCGCTCCGCTCCGCTCCGCCCCGCGCCGCCCGTC^ 

• ••aaa • a • a • • a a a aa a a u 9 » » » • aaaa*aaaaaa*aa 

• ••••• • a a . a a. . . aaa •••...... . !> .!.!.. I !!! i ! 

-CTCCGCCC CCGC CGCCACC - GACGACATGCTGCGCTGCGGCCTGGC 

180 190 200 210 



230 240 250 260 270 280 

CTCCGAGCGCTCCCCCrCaCATCCTCCCCCTGCT 

CTGCGAGCGCTGCAGGTGCATCCTGCCCCTGCTGCTGCTCAGCGCCATCGCCTTCGAC^ 
220 230 240 250 260 270 

290 300 310 320 330 340 

CATCGCGCTGGCCGGCCGCGGCTGGTTCCAGTCTAGCGACCACGGCCAGACGTCCTCGCT 

■ ••••■«••.■■*••• •wm**wm*t»» * a a a a a*«*aa aa a a a a a 

•••••••• • • • •••••••••aaaaa aaaaaaaaaaaaaaaaaa aaaaaaaaa • a • a a 

CATCGCGCTGGCCCGCCGCGGCTGCCTGCAGTCTAGCAACCACATCCAGACATCGTCGCT 
280 290 300 310 320 330 

350 360 370 380 390 400 

GTGGTGGAAATGCTCCCAACAGGGCGGCGGCAGCGGGTCCTACGAGGAGGGCTGTCAGAG 

TTGGTCGAGGTCrrTTCCACGAGGCCGGCGGCAGCGGCTCCTACGACGATGGCTGCCAGAG 
340 350 360 370 380 390 



410 420 430 440 450 460 

CCTCA7GGAGTACGCGTGGOTTAGAGCAGCGGCTGCCATGCTCTTCTGTGGCTTCATCAT 

CCTCATGGACTACCCATCCCCACGAGCAGCTGCACCCACCC^ 
400 410 420 430 440 450 

31 i tovH) 
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470 400 490 500 $TL -J 520 

CCTSGTCATCTCT 1 1LA TC L1L1XJCI l l CXlUXCLiL l l^IGGA CCCCftGATG Llll, , l^:n , 

460 470 460 490 500 510 

530 540 550 560 570 580 

CCTGAGAGTCATTGCAG GTCTCCI TCCCTTGCCTGCTGTGI rCCA GATC ATC T C CCT CC T 

::::::::: :::::::: ::::: ::::::: .j.:::::::::::::::::::: 

CCTGAGAGTCATTCGAGGCCTCCTOGCACTCGCTGCCATA 
520 530 540 550 560 570 

590 . 600 610 620 630 640 

AATrTA CCCC GT G AAGTACACCCACA LLX rC^CL 'lT U VTCCCftACC C r ^ 

::: ::::::::::::::::: ::::: : ::::::::::: ; ::: 

AATCTAC XIC C G T G AAGTACACACAGA LVIUtjA GGC^ 
580 590 600 610 620 630 

650 660 670 680 690 700 

CATCTATAACTGCGCCTA aAX^ 

:::::::::::::::::: ::::: :: ::: ::::::: ::: : 

cATCTATAAcraxrccraT^^ 

640 650 660 670 680 690 

710 720 730 740 750 760 

S CC TC C CCAACTACGAAGATGA CL'iTClV GGCAATGCCAACCCCAG 

:::}:::::::::.::::::::: : : : : . ::::::::::: 
jCTCCCrCCCCAACTACCACGATGAC CTTT Xt^C GCCCCGCCAAGCCCAC 
700 710 720 730 740 7S0 

770 780 790 800 810 820 

GTAC1 X C XACACATCTGCCTAACT1 GCGAATCAATCTCGGAGAAAATCCCTGCTGCTGAG 
::::::::: : ::::::: : :: 

GTACTTCTATCCCO^CCTAATGTGCGACG 
760 770 780 790 800 810 

830 840 850 860 870 880 

::::: : ::: 

ATGCAT- -CTCACGACGAAA CTC 1 1 -CTCCAACCCACAAGCftAACCTA CU 1 1 f GC CCAATC 
820 830 840 850 860 870 

890 900 910 920 930 

TTCATATTATTAAACTACTCAAAAATCCTAAAATAATTT - CCGACAAAATATTTTTTAAC 

• ••«••• I J »»••••*•••••»*!••»• ••IJ!S*!** • J • * 

TTCATATCAT CACAAATCCTACAATAAATGCTAAAGAAAATTCTTCATAAT 

880 890 900 910 920 

940 950 960 970 980 990 

TAG IL I lATACi rTTCATC rTTATCT 1 1'lATTAlUl 1 1 tCTGAACTTCTCTCT i i iuACTA 

:::::::: : . : . . : : : : . : : . 

TA C T C T T A * A G X 1 1CA TGTATCTCOT- - GTGGACTT AAAAAG ACTTG AAT TCTC 

930 940 950 960 970 

1000 X010 1020 1030 1040 1050 

ATTACCTATACTATCCCAATATZ ILL 1 1 ATATCTATCC • ATAACATTTATACTACATTTG 

» • * « * » i * I • • • • »••,#•«••*•'*■•*»••• * • • * • *••*••*«*»»•-•** 

TTTCCTAAG7ATATGCTAA X 1 1 1 ILLX IA TGTCAATTCTATACCATTTAAGCTTCATTTC 
980 990 1000 1010 1020 1030 

1060 1070 1080 1090 1100 1110 

FIG. 11 Ci** 1 *) 
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TAAGAGAATATGCACGTGAAACTTAACACTTTATAAGGTAAA^ -AT 
: . : . : t : : : : : x x : : : : . t : : **4p : ' ?/US * 

TTAAAGAATATv^CTCTCAAACTTGA TAAGGTAGAAATGTAGC*%GCCTCTCAT 

1040 1050 1060 X070 1080 

1120 1130 1140 1150 1160 _ 117 _?_ 

TTAATAATCTGATCAAUl'lLI 1U1 1A1 1 1LCAAATAGAAT 

::::::::::::: . . : : : : : : :::::: : ; : J : : : : J 
TTAATAATCTGATGGGGCTTCTGTT - TTTCCACATAGAAT 
1090 1100 1110 1120 1130 1140 

1180 1190 1200 1210 1220 1230 

inputs TAAGGAGAAGAGGAAGATAAGCTTAAAACTTCTTAATGACCAAACA " -GAA 

i : • s i s • i i : : : s : s s i .rxsxisxx i ; • s • • J s • • I 

TACAGAGGAG - GAAAGTCACTGGCAAAACT - -TCCGTGACCAAATATCCTGAAATTAGTA 
11S0 1160 1170 1180 1190 1200 

1240 1250 1260 1270 1280 1290 

ATGCAAAAAAAAAGTITATTTTCAAGCCTTCGA - - ACTATTTAAGG - -AAAGCAAAATCA 

. : . : : : : : . . : ; •* : • s . : « i * * - : ' 

XTTTITTAAAAAGACCTTATTTTCAGTTTTCAGTTACATAAA 
1210 1220 1230 1240 12S0 1260 

1300 1310 1320 1330 1340 

TTTCCTAAATOCATATCATrTGTCAGAATTTCT 

TTTCCTAAGTGAGCAT CO ' riTC T G AGAATTTTrAC rCTTT 
1270 1280 1290 1300 1310 1320 

1350 1360 1370 1380 1390 1400 

AGCTAAGGCTTCATGTTCACTCGATATGTCATCT 
::::: : • 

TTCTAAG-L'l IUj 101 1 CACTI 1 CTCTGATGCGTAGAAAAGT GTTCTAA 

1330 1340 1350 1360 1370 

1410 1420 1430 1440 1450 1460 

CCTCTTGCCATAGTTCCTAAGG L'l 1 1LL1 1 1A AGTGTGAAATATTTAGATGAAATTTTCT 
: s . : : : : : : . • i s 1 8 ! • ••! • - i .m : ; 

C - -GTAGCCAAGGTTAA -GCCCCTCTCACTAC TGAAATGCTAA-- -GAATTTTCCT 

1380 1390 1400 1410 1*20 

1470 1480 1490 1500 1510 1520 

CTTTTAAAGTTCTTTATAGCGTTAGGGTGTCGG 

CTTTTCCCGTAGTGTAGAGGGGTAGCGTGTGGGA^ 

1430 1440 1450 1460 1470 1480 



1S30 1540 1550 1560 1570 1580 

OT 1 1 11, 1 0 1 T rATATGTTCAGAACCACAGTAGACTGGATTGAAAGATGGAliuuu 1 uIAA 

. : : : : : : : :::::::: :::::: : : : : i-s.isissrss.st : : : : 

ATTCTGTC- -TGTATGCTTAGAACCAGCGTAGACCGGATCGGAGGATCGACTAGGCCTAA 
1490 1500 1510 1520 1530 1S40 

1590 1600 1610 1620 1630 1640 

TTTATCATCACTGATAG ATCT CO T TAAGTTCTG TAGTAAACCATTAGGAGGCTCATTCTT 

TCCCTCCCAACTGCTGGATGTCAAGAGGTCACGTAGGAAGGCAC - AGGAGGGTCACCACT 
1550 1560 1570 1580 1590 1600 

1*50 1660 1670 1680 1690 1700 

GTCACAAAAGTCCCACTAAAACACCCTCACCAGAATAAATCAC TTCCTTTTCTAAA 
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16X0 1620 1630 1640 & 

1710 1720 1730 1740 17S0 1 760 

TCT CA GCT- i lA T CXGGCCTCTA XCATAIftGAGAGG CCT 

is. : n . :. : ittt:: ::. ::::ts:s 

CnVlULCCl liU lCTGACCTCT G ^CirACAGWC^ 
1660 1670 1660 1690 1700 1710 

1770 1780 1790 1800 1810 1820 

CAGAAACCTRCATATAGTTAAA A I CC rCGTCT 1 TLX ICGTA AACAGATTTTAAJIIGTCTG 
:::::::::: ::::::::: t : 

CAGAAACCIAftATtrTAATTAAAA -CLrUOXUXXLLXlUj X AACXZAGACTTAAAATATCTG 
1720 1730 1740 1750 1760 1770 

1830 1840 1850 1860 1870 1880 

ATATAAAACATGCCACAGGAGAATTCGGGGATTTGA G1U TC TCT GA ATAGCATAtATRTG 
: : :: ::t:::t::: :::: ••• 

-TATAGTACATGCAAGTCGAAAATTTCCGAAT- -GCGTGTCTCTGAATA - CATACCCGAA 
1780 1790 1800 1810 1820 1830 

1890 1900 1910 1920 1930 1940 

ATGCATCGGATAGGTCATTATGATTT^ 

CGGCTACTATTA CCTT TTCCTTACCATTTATACITACCTAATGGAAACX3ACCT 

1840 18S0 1860 1870 1880 

19S0 1960 1970 1980 1990 2000 

CATTTTAAATATCAGATT A1 XAX IX 1 C X A A CXTCTU GAAAAACCTA Al^lUTAG 1 1 1 XCA T 
: ::::::::::: :: : s t : : 

TGTTTTAACTATCACAACACTATTTTGTAACCTGCTGCAAAGAC - AGTTGAAGTTTTCAT 
1890 1900 1910 1920 1930 1940 

2010 2020 2030 2040 2050 

TATCAACTI TTCCCAATAAACCAGCTATTCTAAAAAAAAAAAAAAA — - — 
:: :. zt ::: i s ::::::::::::: : 
TAC-CAACT T CCCC AATAAACCACCTCTTCAAAAAAAAA 

1950 1960 1970 1980 1990 2000 



2060 




2010 2020 2030 



Pit 37 CHofH) 
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hukMO 



10 

GTCGACCCACGCGTCCG - 



GTCGACCCACGCGTCCGCGGACGCGTGGGCACTCGGCCACTCTGCGGAGCAGGCATGGGA 



10 



20 



30 



40 



50 



60 



20 30 40 50 60 70 

GCCGCGCGCTCTCTCCCGGCGCCCACACCTC^ 

• • • • ••«••••••••••*••••••••••••• »••■•••••* * 

•»••■•••• • • • • »♦»»••»••••»*•»»•».«••••••• • • •»»«»••••• • 

GCCGCGCGCGTCCTCCGGGCGCCCACACCTGTCTGAGCGGCGCA -CG -GCCGCGGCCCCG 

70 80 90 100 110 



80 



90 



100 



110 



120 



130 



GCGGGCTGCTCGGCGCGGAACAGTGCTCGGCATGGCAGGGA1 



GCGGGCTGCTCCACGCGGTA — GCACTCAGCATCGCTGGAATCCCGGGGCTCTTCATCCT 
X20 130 140 150 160 170 



140 



150 



160 170 180 190 

3AGCCCTTACAGTGCCCCCTCGAAACC 



TCTTGTC CTGC 

180 190 

200 210 
CACTTGGCCTGCATA 



\TGCAGGTGAGTCCCTACACCGTTCCGTGGAAACC 
200 210 220 230 

220 230 240 250 

3TCTTGCCCCAGTCTACCCTCAATTTAGCCAA 



CACATGGCCGGCTTATCGCCTCCCTGTAGTCTTGCCTCAGTCTACCCTCAACTTA 
240 250 260 270 280 290 

260 270 280 290 300 310 

GCCAGACTTTGGAGCCGAAGCCAAATTAGAAGTATCTTCTTCATGTGGACCCCAGTGTCA 

GGCAGACTTCGACGCCAAAGCGAAATTGGAGGTGTCCTCCTCATGTGGACCTC 

300 310 320 330 340 350 

320 330 340 350 360 370 

TAAGGGAACTCCACTCCCCACTTACGAAGAGGCCAAGCAATATCTGTCTTATGAAACGCT 

*•*»**»* »•*»*•**»** • * * * * • * * 

****** «*.#•• #»•*»***•**•*»**•*»* * * • • ******** * * 

CAAGGCAACACCACTGCCCACCTACGAAGAGGCCAAGCAGTACCTTTCCTATGAAACCCT 
360 370 380 390 400 410 

380 390 400 410 420 430 

CTATGCCAATGGCAGCCCCACAGACACGCAGG'rCGGCATCTACATCCTCAGCAGTAGTCG 

TTATGCCAATGGCAGCCCCACACACACTCGGGTGCGCATCTACATCCTCACCAATGGTGA 
4J0 -130 440 450 460 470 

440 4 50 4»i0 470 480 4*0 

AiiATGCc>#l?CCAACACiJliAGACnrA(7tfGTCrnrACCAAAGT(rrCGAAGGAAGC(7GCAGAT 
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AGGCAGGGCACGAGGCAGAGACTCGGAGGCCACAGGGAGATCTCGCAGGAAGAGGCAGAT 
480 490 500 510 520 530 

500 510 520 530 540 550 

TTATGGCTATGACAGCAGGTTCAGCATTTTTGGGAAGGACTTCCTGCTCAACTAC 

• •••••••• * » ••••••* • • •>••• 

• ■ • •■•••**«•••■*•••••■••■••••••• . . ••••• 

TTATGGCTACGATGGCAGGTTTAGCATTTTTGGGAAGGACTTCCTGCTCAA 

540 550 560 570 580 590 

560 570 580 590 600 610 

CTCAACATCAGTGAAGTTATCCACGGGCTGCACCGGCACCCTGGTGGCAGAGAAGCATG 

• ••*••••• • ■ • ■ ■••••••••*•••••••••■•(, ■ ■ 

• ■•••••••••••••••■••a ■•••••••••«•••■>■■>•■•• * • 

CTOUVCATCGGTGAAGTTGTCTACTGGCTGCACTGGCACCCTGGTGGCA 

600 610 620 630 640 650 

620 630 . 640 650 660 670 

CCTCACAGCTGCCCACTGCATACACGATGGAAAAACCTATGTGAAAGGAACCCAGAAGCT 

CCTCACTGCTGCCCACTGCATACACGATGGGAAAACCTATGTGAAAGGGACACAGAAACT 
660 670 680 690 700 710 

680 690 700 710 720 730 

TCGAGTGGGCTTCCTAAAGCCCAAGTTTAAAGATGGTGGTCGAGGGGCCAACGACTCCAC 
■■•■••••••■•••••••■>•■••••••■••«■«••• •••••• » 

CCGAGTGGGCTTCCTGAAGCCCAAGTATAAAGATGGTGCCGAAGGGGACAACAGCTCGAG 
720 730 740 750 760 770 

740 750 760 770 780 790 

TTCAGCCATGCCCGAGCAGATGAAATTTCAGTGGATCCGGGTGAAACGCACCCATGTGCC 

CTCAGCCATGCCAGACAAGATGAAGTTTCAGTGGATCCGCGTGAAACGCACCCATGTGCC 
780 790 800 810 820 830 

800 810 820 830. 840 850 

CAAGGGTTCCATCAAGCKXrAATGCCAATGACATCGGCATGGATTATGATTATGCCCTCCT 

• •!*■! •■•••i!ii!i!i>!!!!it!!t!i!!!{2'!!'i(i!'t! I * • • • * * I I 

CAAGGCGTGGATCAAGGCCAATGCCAATGACATCGGCATGGATTATGACTACGCCCTGCT 
840 850 860 870 880 890 

860 870 880 890 900 910 

GGAACTCAAAAAGCGCCACAAGAGAAAATTTATGAAGATTCGGGTGAGCCCTCCTGCTAA 

GGAACTCAAGAAACCCCACAAAAGACAGTTCATGAAGATTGGTGTGAGTCCTCCAGCGAA 
900 910 920 930 940 950 

920 930 940 950 960 970 

CCAGCTGCCAGC^CJGCAGAATTCACTTCTCTGGTTATCACAATGACCCACCAGGCAATTT 

GCAGCTCCCAGGGGGCAGGATCCACTTCTCTCCTTATGACAATCACCGGCCCGGCAATTT 
960 970 980 990 1000 1010 

980 990 1000 1010 1020 L0J0 

GGTGTATCCCTTCrG'WACGTCAAAGACGAGACCTATCACTTGCTCTACCAGCAATGCGA 

GCTGTACCGCTrcn JTi 7A*P JTCAAAGATC ■AGAirCTACGACCrTCTCTACCAGCAG'rG'rC A 
10JO 10)0 10 10 1050 1060 1070 
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1040 1050 1060 1070 1080 1090 

5 TGCCCAGCCAGGGGCCAGCGGGTC 

::ssss«« • ♦ ■ 

CGCCCAGCCCGGGGCCAGTGGTTCAGGGGTCTATGTGAGGATGTGGAAGAGACCACAGCA 
1080 1090 1100 1110 1120 1130 

1100 1110 1120 1130 1140 1150 

i GAAGTGGGAGCGAAAAATTATTGGCATTTTTC 

GAAATGGGAAAGAAAAATTATCGGCATCTTTTCAG 

1140 1150 1160 1170 1180 1190 

1160 1170 1180 1190 1200 1210 

: TTCCCCACAGGATTTCAACGTGGCTGTCAGAATCACT^ 

. _ aaaaaaa>asa*>*aaa*aaaa a a • ■ a a a aaaaaaaaaaaaaaaaa 

* . Saaalaaaaliia" aaaaaaaaa aaaaaaaa a a a a a aaaaaaaaaaaa*aa>aa 

CTCTCCACAGGATTTCAACGTGGCAGTTAGAATCACGC 

1200 1210 1220 1230 1240 1250 

1220 1230 1240 1250 1260 1270 
CTATTGGATTAAAGGAAACTACCTGGATTGTAGTC 

aaaaaaa aaaaaaaaaaaaaaaaa a • a a a aaaaaaaaaaaaaa • a • * aa a a 

.....••....,.......>.••.••••«• ...... a. a a. aaa a. a • * aa aa 

CTATTGG ATTAAAGGAAACTACCTAGATTGCAGGGAGGGGTGACA -TGCGT — CTTCTTG 
1260 1270 1280 1290 1300 1310 

1280 1290 1300 1310 1320 1330 

GCAGCAATTAAGGGTCTTCATGTTCTTATTTT^ 

a a a a a a aa m m m m aa a a • -•*--•••«» a a a a a a a ■ a a a a 

I a a • a • • a • • a a a a a a a aa* a a • aaaaaaaaaa a a a aaaaaaaaaaa 

CCAGCACCAATGG - TCTTTTTGCACTC ATTGTAGG AG AGGC TAGCTTTTTATCATT 

1320 1330 1340 1350 1360 

1340 1350 1360 1370 1380 1390 

GGCGTGCACACGTGTGTGTGTGTGTGTGTGTGTAAGGTGTCT — 

. a ■ a a a a a aa'aaaaa • a ■ aaaaaaaaaaa* 

; a a a • • a a a aaaaaaaa a aaaa aaaaaaaaaaa** 

G ACTCTTGTG GTGTGAGTCA CATAGTATCTTTTACCTAGT 

1370 1380 1390 1400 

1400 1410 1420 1430 1440 1450 

TTTCTTAC AATTGCAAGA - TG ACTGGCTTTACTATTTG AAAACTGGTTTC 

a a ■ a • a a a aaaa a a « aaaa* aa aaaa* * * * a a a • a aaaa • a 

...... .a. m m m m m m a a »*aa***a aaaaa m • m m m » m a aa»a .aa 

ATTCTTCAAATGGCAAAAATTATTGCCTATATC CGT- - 

1410 1420 1430 1440 1450 1460 

1460 1470 1480 1490 1500 1510 

CATATATCATTTAAGCAGTTTCAACGCATACTT^ 

* * * * •**"•*•***•* «!aa******!II!»*a**aaaaa .aaaaa .a a 

- - TATAGC ATTTAAGCAGTCTG AAAGCAT ACTTTTGCATAGAG ACTTTAAA GTA 

1470 1480 1490 1500 1510 

1520 1530 1540 1550 1560 1570 

TTGGGGCAATGAGGAATATTTCACAATTAAGTTAATCTTCACGTTTTTG 

TTCGGCTAATAGGCCCTATTTGACAAGGAAGTTA-AACTT^ 

1520 1530 1540 1550 1560 1570 

15*0 1590 LiiOO LtS 10 1620 16)0 

rrmv \TTTCATCTCAAcrrnTTTC 
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TTTTTGTCTGATCCAAACrTGCTTC 

1580 1590 1600 1610 1620 1630 

1640 1650 1660 1670 1680 1690 

ATGAATTCTTATATGTGTGCATGTGT - -GTTTTCTTCTGAGATTCATCTTGGTGGTGGGT 

....».•....• • • • a •••• • . • • • 

• ••••••••••••a • • • • •••••• • . . « . 

ATGAATTCTTATGTTTGTATATGTATATGl"rriVl"lXrrGAGAGTCAT 

1640 1650 1660 1670 

1700 1710 1720 1730 1740 1750 

TTTTTTCTTTTTTTAATTC^ 



- ATATTG ATATTTTTGTAATGTG — TGGT-TATTATGCTTCCA 

1680 1690 1700 1710 

1760 1770 1780 1790 1800 1810 

TTAGGAACTTTGACAGCATTTGTTAGGCAC^ 



GATAATGATAGCA 

1720 1730 

1820 1830 1840 1850 1860 1870 

TAGTCTTTGAACAGTAAAATGATGTGTTGACTATACTGAT 



1880 1890 1900 1910 1920 1930 

TATAGTAAACCAGTATCCCAAGCTGCITTTAGTTCCAAAA 



1940 1950 1960 1970 1980 1990 

TGTTCCTCTACTTTCTAGGAAGTCTTTGCATATGGCCCTC 



AACTCTT - -CAATAGGC 

1740 

2000 2010 2020 2030 2040 2050 

GAGTGGCCAAGAGTCTTTATCCCAACCCTTCCATTTAACAGGATTTC 



2060 2070 20H0 2090 2100 2110 

CAACTACCTATTTTTCAGAACACAATAATCACCGCT^ 



2120 2L30 2140 21S0 2160 2170 

CCCACCAAACA(Vri^;T(XCcrACACTAAAAACAATCATAGCATTTTACCCCTGGATTATAG 



r/6. 38 (H of 7) 
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2180 2X90 2200 2210 2220 2230 

CACATCTCATGTTTTATCATTTGGATGGAGTAATTTA 



AATTTATAATGTTTTGGATTC 

1750 1760 

2240 2250 2260 2270 2280 2290 

AATGGAAGCATTGCCTCGCAGATGTCACAACAGAATAACCAC X T GTTTG GA 



AAACATT 

1770 

2300 2310 2320 2330 2340 2350 

AGTCCTCCAGCCTGATCAAAAATTATTCTGCATAGTTTTCAGTGTC 



TACGTAGTAGTC 

1780 

2360 2370 2380 2390 2400 2410 

TGTACTTCTTCAATTTGGAAACTTTTCTCTCTC 



CTTGAAGAGAA 

1790 

2420 2430 2440 2450 2460 2470 

CTTTAAGAAAACCAGTGTGGCCTTTTTCCCTCTAGCTT^ 



CAATAA : 

1800 

2480 2490 2500 2510 2520 2530 

ATGCTCTAGGTTATAGATAAAOVATTAGGTATAATAGCAAAAATGAAAATTGGAAGAATC 



TTTATTGGCTATATTGATA 

1810 1820 

2540 2550 2560 2570 2580 2590 

CAAAATGGATCAGAATCATCCCTTCCAATAAAGGCCTTTACACATGTTTTATCAATATGA 



2600 2610 2620 2630 2640 2650 

TTATCAAATCACACCATATACAGAAAAGACTTGGACTTATTGTA TG TTTTTATTT^ 



2660 2670 2630 2690 2700 2710 

CrCTCGGCCTAAGCACTTCTTTCTAAATGTATCGCAGAAAAAATCAAATGGACTACAACC 



2720 2730 2740 2750 2760 2770 

ACGTGTTTCCTGTCCTTCCACCCCACGTAAACCTCCA'r 



flC, 3S ($ of 7) 
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CCCA TATAAG 

1830 

2780 2790 2800 2810 2820 2830 

CAGATGGAGCACTGTCACTTAGACATTCTCTGGGGGATTTTCTGC 



- ACTGTATCTTA - 
1840 



2840 2850 2860 2870 2880 2890 

TTTTTGGAAGGATAATTCTGATAAGGCACTCAAGAAACGTACAACCACAGTGCTTT 



CAGTGCA 

1850 

2900 2910 2920 2930 2940 2950 

AAATCATATGAGAAATACTATGCATAGCAAGGAGATGCAGAGCCGCCAGGAAAATTCTGA 



-CAGA 



2960 2970 2980 2990 3000 3010 

GTTCC AGCACAATTTTCTTTGG AATCTAAC AGG AATCTAGC CTG AGG AAG AAGGG AGGTC 

ATTCC — CAC GC 

1860 

3020 3030 3040 3050 3060 3070 

TCCATTTCTATGTCTGGTATTTGGGGGTTTTGTTTGTT 



-TGCTTT- 
1870 



3080 . 3090 3100 3110 3120 3130 

AAGTTCACTGAACACCAAGACCAGAATGCA l TT rrrr AAAAAAATAGATGTTCCTTTTGT 



3140 3150 3160 3170 3180 3190 

GAAGCACCTTCATTCCTTCATTTTCATTTTTO 



TAGTTTTGA 



3200 32LO 3220 3230 3240 JC50 

AATGAAATCAATGTTTAGTTCACAAGTAGATGTAATTTACTAAAGAATGATACACCCATA 



3260 3270 3280 3290 3300 )H0 

TGCTATATACAGCTIVXACTCACAGAACTGTAAAAGAAAA'rTATAAAATAATTCAACA'njT 



AAATAAAA'." 

L.SHO 
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3320 3330 3340 3350 3360 3370 

CC^TCTTTTTAGTGATAATAAAAGAAAGCATGGTATTAAACTATCATAGAAGTAGACA^ 



3380 3390 3400 3410 3420 , 3430 

AAAAGAAAAAAGGACTCATGGCATTATTAATATAATTAGTGCTTTACATGTG 



3440 3450 3460 3470 3480 3490 

ACATATTAGAAGCATATTTGCCTAGTAAGGCTAGTAGAACCACATTTCCCAAAGTGTGCT 



TTTCCC 

1890 

3500 3510 3520 3530 3540 3550 

CCTTAAACACTCATGCCTTATGATTTTCTACCAAAAGTAAAAAG 



TTGTAAAAAA 

1900 

3560 3570 3580 3590 3600 3610 

AGGAAGATGCCrcTCCATTTTCCCTCTCTTTA 



3620 3630 3640 3650 3660 3670 

TAAAAGCTCTGGGAAGACCTCrrTCrrAAAGGGACAAGTTC 



3680 3690 3700 3710 

AATAAACATCTTTGATCACAAAAAAAAAAAAAAAGCGCGGCCG 

•••»••••••••*«■•>>•••••• 

*•••»•*•••*■•••■•••••«•» 

AAAAAAAAAAAAAAAGGGCGGCCG 

1910 1920 



f(G 39 Clofl) 
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10 20 30 40 
GTCGACCCACGCGTCCGGGCTCATGGCGCCGGC GTCGCGGT- 



-TGCTC— 



GTCGACCCACGCGTCCXMT-TCATGGCGGCGGCTGGGCGGCGCGGTCTGCTTTTGCT 
10 20 30 40 50 

50 60 70 80 ' 90 100 

-GCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG-GGCGGAG 

. a a a a a • • • • ■ • • • • • ■ ■ • a a - • » . . a . • • * a » - 

• • • • ■ a • • • a » a aaaaaaaa ...... » a a a a a a a a a a • 

TGTACTATGGATGATGGTGACTGTGATTCTGCCTGCCTC^ 
30 70 80 90 100 110 

110 120 130 140 150 160 

GGTGGCGCCCGGGCGGGCCGGGGGCCGTGGCGGAGGAGGAGCGCTGCACGGTGGAGCGTC 

. si aa. . « • . .... a « a . • js.sssj;;;;!: 

GAATGGGCT-GGGAATTGCAGCAGCAGTAATGGAGGAGGAGCGTTGCACAGTGGAGCGTC 
120 130 140 150 160 170 

170 180 190 200 210 220 

s GGGCCGACCTCACCTACGCGGAGTTCGTGCAGCAGTACGCCTTCGTCAGGCCCGTCATCC 

a ■ a a « • aa.a • • a a »• a a a a a • a a • a»*. «•*... 

aaaa aa a a a • aaa a aaaaaaaaaaaaaa a. ...... ...a.....«a*.. 

GGGCACACATCACGTACTCCGAATTCATGCAGCACTATGCCTTCCTCAAGCCCGTCATCT 
180 190 200 210 220 230 

230 240 250 260 270 280 

5 TGCAGGGACTCACGGACAACTCGAGGTTCCGGGCCCTGTGCTCCCGCGACAGGTTGCTGG 

TGCAAGGACTCACGGACAACTCGAAGTTCCGGGCCCTGTGTTCCCGGGAAAACCTGCTAG 
240 250 260 270 280 290 

290 300 310 320 330 340 

; CTTCGTTTGGGGACAGAGTGGTCCGGCTGAGCACCGCCAACACCTACTCCTACCACAAAG 

CCTCGTTCGGGGACAACATTGTTCGCTTGAGTACAGCCAAC^ 
300 310 320 330 340 350 



350 



360 



370 



380 



390 



400 



TGGACCTGCCCTTCCAGGAATATGTGGAACAGCTGCTGCAGCCCCAGGATCCTGCATCCC 
360 370 380 390 400 410 



410 420 430 440 450 460 

TGGGCAATGACACCCTGTACTTCTTCCGGGACAACAACTTCACCGAGTGGGCCTCTCTCT 



TAGGCAATGACACCCTGTACrriTrrGCAGACAACAACTTCACTGAGTGGGCATCCCrrCT 
120 430 440 450 460 470 



470 480 490 500 510 520 

TTCC^CACTACrcCCCACCCCCATTTCCCCTC 



fT6 39 CiorU>) 
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TCCAGCACTACTCTCCGCCACCATTCCGTCTCCTGGGAACCACCCCTGCTTACAGCT^ 
480 490 500 510 520 530 

530 540 550 560 570 580 

GAATCGCAGGAGCTGGCTCGGGGGTGCCCTTCCACTGGCATGGACCCGGGTACTCAGAAG 

GAATTGCAGGAGCTGGATCTGGGGTACCCTTCCACTGGCATGGGCC 
540 550 560 570 580 590 

590 600 610 620 630 640 

TGATCTACGGTCGTAAGCGCTGGTTCCTTTACCCACCTGA 

• • • • • • • ' • • • ■ • • ■ • • *•«•••*•*• a a aa aaaa»».„ 

• a a a a a « a a • ■ aaaaaaaaaaaaaa aaaaaaa*aaaa*a»aaaaaaaaaaaa*aaa 

TTATCTATGGTCGCAAGCGCTGGTTCCICTACCCTCCTGAGAAGACAC 
600 610 620 630 640 650 

650 660 670 680 690 700 

CCAACAAGACCaXCGCTGGCCTGGCTCCG03ACACATACCCAGCCCTCCCACCGTCTGC 

• ...aaaaaaa a a a a a • a a a a • a . , . - a a m a a a 

• aaaaaaaaaaaa a ••» a aaaaaaa a a • • a a 

CTAAGAAGACCACATTGGCCTGGCTGCTGGAAATATACCCATCTCTAGCCCTGTC^ 
660 670 680 690 700 710 

710 720 730 740 750 760 

GGCCCCTGGAGTGTACCATCCGGGCTGGTGAGGTGCTGTACTTCCCCGACCGCTGGTGGC 

■ • » • aa aa *m»»»*mm»» **••»••*. aa a • a • a a a a. a a aa a a * a - - . 
• • • • •••••mm»»*»***»*»9999m»»*» + i+ m*»»»* aa aa aa a a a • a a a . ' 

GGCCTCTAGAATGTACCATCCAGGCTGGTGAAGTACTGTATTTTCCTGATCGGTGGTGGC 
720 730 740 750 760 770 

770 780 790 800 810 820 

ATGCTACCCTCAACCTTGACACCAGCGTCrrCATCTCCACCTTCCTCGGCTAGCCAAAAC 

jaaa | • ( • • i t • aa aaaaaaaa aaaaaaa* aa ••m9m9mm aaaaaaaaa a a 

ATGCCACACTCAATCTGGACACCAGTGTCTTCATTTCTACCTTCCTrGGCTAGCCAGA-C 
780 790 800 810 820 830 

830 840 850 860 870 880 

AGCTGGCAGG ACTGCCGGTCACA - CACCAGCACGTCCCACC -TCGTGCTCACGG ATTTTA 

" " • • ■ aaa aaa a aa* a aaaaaaaa aa aaaa 

• • aaa... aaaa .aaa aaaaaaaaaaa aaa a aaaa.aaaaaa aaaa 

AGGCAACTGCCAACCC- - -CACTCCACCAGCACATCCCAATCTAGTGCTCACAGACTTTA 
840 850 860 870 880 890 

890 900 910 920 930 940 

TT.^CACAGATAGTGGCGGCAATGGCCTCAGCCCAGCCCACCCTCACCTGCTTTTCCAGCC 

TTACA -GGACACTCGCAGCAGCAGCAAC — CTCAGCCCACCCTCACCCACTCT -CCAGCC 
900 910 920 930 940 950 

950 960 970 980 990 

C.*Ca*AAGCGCGACGA TCACGGCCCACCAAAAGCGATGCTCACAGGGCAAACAG 

CA - C AAGGGCG ACAAGGG AGGCTCATGGTCCAGCAAGGGGTATGCTC AGAACGGG AGCAG 
960 970 930 990 1000 

1000 1010 1020 1030 1040 1050 

TCCaAGAGTCCAACAGCAGAACTTCGGGGAAGCCGTCGGGGTGGCCAGGAACATAAACTAT 

T7CA0AACCCATCACCAGGCCC -CATCCCGGCAGGC CCAGGCACACAAACTA? 

1011 1020 1030 1040 1050 10-50 



fiC 1*1 (Zorl) 
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1060 1070 1080 1090 1100 1110 

GT ATAGGGGCCGGGGGCTTCTG - C - CCAGGGCTCCCCTGGACCAGGACGCCAGGTAGGGC 

. _ _ m a • ••••• • • * ■ a ■ • • ■ • a • • a a • * a a • * a • a a m m a 

m * a a • • ■ •••••»• • • ••••• • • •••-•••••••« •»•••• a a a a * 

ACA- - -GGGACTGGAGCTTCCGTCTCCAGATC - CTCCTGGGCCAGGGTGCCAGGCAGGAC 
1070 1080 . 1090 1100 1110 

1120 1130 1140 1150 1160 1170 

AGGGAACCTCAGTAGTCCTCCACCCAGCCATTCTCAGAGATGAATGCGTCAATAACCTCC 

ATGGGGCCTCAATAGTCCTCT.^CCC^GCCGTTCTCAGAGATC 
L120 1130 1140 1150 1160 1170 

1180 1190 1200 1210 1220 1230 

TTCATAGCCAAGTTGGGGATGAGCTGTTCCTGGGT 

a!«Ia*Ia.>.«Z«*«> •■»••••••••••»••••>•••••••••••• • 

TTCATGGCCAAGTTGGGGATGAGCTGTTCCTGGGTCAAAGGGCTC TCA 
L180 1190 1200 1210 1220 1230 

1240 1250 1260 1270 1230 

AAATGACCCACACGCTGCA GTGACAAGAAGGG-CAGAGGGCAGTCATGG- -GGCCCA 

AAGTGGCCCACACGCTGCAAGAGAGTCAAGAGTGTTCAA 
1240 1250 1260 1270 1280 1290 

1290 1300 1310 1320 1330 1340 

GG - ACC ATGCCACT GGCCCTG-CTCCCCCAGCCGCAGGCCTCACCTGCAGGTGCTC 

-a a a a • • • » • • • • • • • • • • ■ • • a • •«* »».«»»•»•••••• 

J! i.Zaa* ••••• ..a.. • aaa. • • a a aa aa.aa. »»••■•»• 

GGTACCAAGGCTCTCCATGGCCCGGTCTCCATGGGCC -CT - -CCTTACCTGCAGGTGCTC 
1300 1310 1320 1330 1340 1350 

1350 1360 1370 1380 1390 1400 

CTCGATGTCCTTGCGGTCGTAGGTGATGCCAC1TGGGCGTGATGCACGGCTCCCGCATCAG 

.,. ...aaa.a.aaaa* «••••••• • • • • ■ aa 

laa********************************* »••••••« • • • 

CTCAATGTCCTTGCGGTCATAGGTGATACCACTGGGTGTAATGCAGGGTTCCCGCATCAG 
1360 1370 1380 1390 1400 1410 

1410 1420 1430 1440 1450 1460 

CrCAAAGCrGATCTTCCCACACAGGTAGTC^ 

■ • ••>•>•> • + • • • •*»•• » • • 

CTCAAAGCTAATCTTGCCACACAAGTAGTCAGGGATATCTCGCTTCTATAGCACAAGGG 
1420 1430 1440 1450 1460 1470 

1470 1480 1490 1500 1510 
ACACGCTC AG AGGCTG AAAAGGGGC ACTGC ACG AGCACC -TCCCAGCCATCCGC A 

9 m m m • # • w • • # t> • • • • * • * 

, , **••••»•••*>»** * • » - * • • • * •■*»•*• « * 

AAAATGTCTACAACTGGAG - GGGGCTGTGGG - CGTC ACC ATACC ACC - AGC ACCCC ATC A 
1430 1490 1500 1510 1520 1530 

15:0 1530 1540 1550 1560 1570 

GCAACCGACACACACTCACCTTCCTCTTCrCATCCACCTGAGAAAAAAGCTCGTCCATGT 



GCTTCCCGGCGTC-CTCACCTTTCTTTTCTCGTCCACCTGAGACAAGACCTCATCCATAT 
1540 1550 1560 1570 1580 1590 

15-?: 1590 1600 1610 1620 1630 
CCjCCA rGTACTTGTCCTOTGAAGAGTTCAGTCCTGTGCrrCGCGCA GACACCCC 
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CTGCCATGTATTTATCCTG- -CAGAGTTGAGTGCCATGTGTGGGCAACTCCTGTCTCCAC 
1600 1610 1620 1630 1640 

1640 1650 1660 
AC CTCCC TCCTCCATGGGGCACA-GAC CCAACA CA- 



ACAGACACACACACTCTGTCCACCAGGGCACTCATGTCATGCATGGGCCAACAGATCCAC 
1650 1660 1670 1680 1690 1700 

1670 1680 1690 1700 1710 

- - - AGGCGGGGATGCT — -C CCACGCCACGTGCACACACACA — GACCCACATGTGG 



CAAAGGCTGGGGCACTTTTCATGCCACAC - ACAAACACACACACAATGACCCACATGTGG 
1710 1720 1730 1740 1750 1760 

1720 1730 1740 1750 1760 1770 

GTGGGGGGCACCCTCACGTGCTTGGCCTCAATGCACGCCTGCTGGGCCCGGACGTGGCTG 

ACTAGGGGCACCCTCACGTGCTTGGCCTCA.^TGCAGGC 
1770 1780 1790 1800 1810 1820 

1780 1790 1300 1810 1820 1830 
TCGTCCTCATCACCCTCGTGGTTTCGCTGGCACTCTTC 

• • • • • • • • ••••••«••••• ••••• • • • a • aaaaa • a a • a • aaaaa- » » » » » 

aaaaa aaaa *a«aaaaaaaa« m»*»»*»»»»» a a • a a aaaaaaaaa*aa»»a«aa 

TCATCTTCATGACCCTCGTGGrrCCGCTGAC\CTCCTCCAGTTCCCTC 
1830 1840 1850 1860 1870 1880 

1840 1850 1860 1870 1880 

GAGCCGGTCAGAGATGGACCTGGCCAGATGT CTGACCACACCCCAATCTCAGA — GC 



AAGCTAGTTGGTGATGGCCCTGACCAGGAAATCACAGAGCCCGCCCCA-TCTCAGGCCTC 
1890 1900 1910 1920 1930 1940 

1890 1900 1910 

TAACATCCACA-CTTCCC CACATTT-C 



TTTCC 

1950 1960 1970 1980 1990 2000 

1920 1930 1940 
CTGCTTG CCAGTAAAGC CTTCGATAAAC - 



GACTCAGTCTCnPGCrGGGGGAGGGACCCACCTCTCTCGCTCAGCAGCAATGAGCCTGGTC 
2010 2020 2030 2040 2050 2060 

1950 1960 1970 
AAAAAAAAAAAAAAAAAAAACGGCGGCCG 



AGATATGAATGCAAAAAAAAAAAAAAAGCGCGGCCG 
2070 2080 2090 2100 
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10 20 30 40 50 60 

MJftUUC AATTCGGIOTCMKXKGVVGGVVGCCGG 

{, OU* J g — TCGACCCACGCGTCCG - -GCTGGCGGAGCAGGAGGATGGGCGAGCAGTCTGAATGCC 
10 20 30 40 50 

70 80 90 100 110 120 

AGAATGGATAACCGTTTTGCTACTGCGTTTGTGATTGCTTG 

............. ..»«••*•••••••••••••••••*••»••••• »•■••■ 

AGAATGGATAACCGTTTTGCTACAGCATTTGTAATTGCTTGTGTGCT^ 

60 70 80 90 100 110 

130 140 150 160 170 180 

ACCATCTACATGGCGGCCTCCATAGGCACGGACTTCTGGTATGAGTATCGAAGTCCCATT 

ACCATCTACATGGCAGCCTCCATTGGCACAGACTTCTGGTATGAATATCGAAGTCCAGTT 
120 130 140 150 160 170 

190 200 210 220 230 240 

CAAGAGAATTCAAGTGACTCGAATAAAATCGCCTGGGAAGATTTCCTCGGTGACGAGGCG 

_ _ . . a a ■ ■ ■ a a a a a a • »«••»•»• a a • a a • • • • • • a ■ a a a a a a a • • 

!!!!••!!■•! . . " J ; . ........ . . ............. . ■•*.. ....... 

CAAGAAAATTCCAGTGATTTGAATAAAAGCATCTGGGATGAATTCATTA 
180 190 200 210 220 230 

250 260 270 280 290 300 

GATGAGAAGACTTACAACGATGTTCTGTTCCGATACAACGGCAGCTTGGGGCTGTGGAGA 

..... ...a.... • a a • • « Baaaaaaa •••• • a • * ..aaaa.a 

• ••.•»..••...* • . . . . . • • a a a ........ •••» ....a ........ 

GATGAAAAGACTTATAATGATGCACTTITTCGATACAATGGCACAGTGGGATTGTGGAGA 
240 250 260 270 280 290 

310 320 330 340 350 360 

CGGTGCATCACCATACCCAAAAACACTCACTGGTATGCGCCACCGGAAACGACAGAGTCA 

..... ...........*••••»•• .aa.. ..*...*..••»*.. 

..... aaaaaaaaaaaaaaaaaaa a • ....... •■....»»•• •.....•.... 

CGGTGTATCACCATACCCAAAAACATGCATTGGTATAGCCCACCAGAAAGGACAGAGTCA 
300 310 320 330 340 350 

370 380 390 400 410 420 

TTTCATGTCTCTTACCAAATCCATGAGTTTCACACTAAAC 

........... .. ..... .........•.•»•. * « 

.........a. . . ..... ...... .....a.... ••«*.«........•••■■*. 

TTTCATGTCGTCACAAAATGTGTGAGTTTCACACTAACrGAGCAG^ 
360 370 380 390 400 410 

430 440 450 450 470 430 

GTGGACCCCGGCAACCACAATAGCCGCATCGACCTGCTTCGCACCTACCTGTCGCGCTGC 

GTTGATCCCGGAAACCACAATAGCGGGATTGATCTCCTTAGGACCTATCTTTCGCGTTGC 
420 430 440 450 460 470 

490 500 510 520 530 540 

CAGTTCCTTTTACCCTTCGTCAGCTTCGGCTTGA'rcTCCTTTGGGGCGTTGATTGGCCTC 
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: : : : : : : : . : : ::::::::::::::::: : : : : : : : : : 
CAGTTCCTTTTACCTTTTGTGAGTTTAGGTTTGATGTGCT^ 
480 490 500 510 520 530 



550 560 570 580 590 600 

TGTGCCTGTATCTGCCGCAGCCTGTATCCCACCCTCGCCACTGGCATTCTC 
: : : : : : ; : : : : : : : : : : : : : : : : :::::::::::::::::: 

UVfG C TTGCATTTGCCGAAGCTTATATCCCACCATTGCCACGGGCATTCTC 
540 550 560 570 580 590 

610 620 630 640 . 650 

GCAGGTCTGTGCACA CTGGGCTCCGTGAGTTGCTATGTTG — C — CGGCATTGA — 

. :: : : 

GCAGGAAATTACTCAGATTCTTGGCTCC^TGAATAATTT^ 
600 610 620 630 640 650 

660 670 
ACTC TTACATC AGAAAGTAG- - 

: : : : 

TTGATAATTACTCATTTCTCAATAATCTTTTAATTTC 
650 670 680 690 700 710 

680 690 

AGCT GCC CAAGG ATGTATCTGG 

;;;; zzz I J S J i;»«5J»5" 

TCCAAGCTCTTTAAATGGCCTTACAAACTCATTGGCAAGT^ 
720 730 740 750 760 770 

700 

AGAATTT GG ATGGT C 

ACCTTTTAGTTTTTCCAGTC^GCCATGCCTATGGTAG 
780 790 800 810 820 830 



710 

CTTC TGC ; -CTGGC 

: : : : : : : : : : : : 

CTTCGATCAATCTTGCAlTX3AGATTCCCATCCCCTrGAATCTAGCCTC 
340 850 860 870 880 890 

720 730 
CTG — - t CGTCTC GCC TC 

TTTGACCAATAGAGTGTCCCTGAAATGACACTCTTCTCATGAG 
900 910 920 930 940 950 



740 

-CCTTA CAGTTC 



TCCTTAAACCAGTTCTCrTGGAACACTCAGTCTTAGAACATTCCCTCTCCAAACCCAGAT 
960 970 980 990 1000 1010 

750 760 
ATCGC - -CCCCCCT CT CTTCATCTC 

ACCATCCTGTCAACTCCACCCCACATGCACCTCTCCTCTGTACATCCTCCACCTCA^ATC 
10:0 10J0 1040 1050 1060 L070 



WO 00/18904 



84/112 



PCT/US99/22817 



770 780 790 
-GGCTGCCCACA CCAACCG -GAAAGAGTAC 



CCAAGCTAAGCTCCCAACTGACAGCC^ACATCATTTCCAGCCATC 

1080 1090 1100 1110 1120 1130 

800 810 
ACCTTAA TGAAGGCTT ATC 

GGATGTCCAGCCTTAACAAGCCTTCAGAGGACTTC^ 

1140 1150 1160 1170 1180 . H90 

820 830 840 
GTGTGGC ATGAAGGG AGGCTG CCTG CT 

CCTTGTGAGACTCTAATAAAGAACCAACTAGCTGAGCCCAATCAACCTATGGAAC^ 
1200 1210 1220 1230 1240 1250 

850 860 870 
TAATGATTAATATTTTT CATACATTTTTTT 

GAAATAAAATGAATTGTTGTTTTGTGCCGCTAAAAAAAAAAA 

1260 1270 1280 1290 1300 1310 



GGCCGCCGC 
1320 



40 Clef!) 
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10 20 30 40 50 

Hl>M A U GTCGACCCACGCGTCCGGCGGCTAGGCCCGCGTGCGCTGGAGACCTCCGCGCTGGCCCC - 

• • ■ • • • * ^ * XT' - « • • •• * * • ass • • • • a m 

• ■ •••••• • ■ • • • » v . ••••••■••••« ••• Z 1 Z Z * Z 

MORI H £ TCCG-GTCCAN-GAAAAAGCT-GCTTGCACTAGGGGCATCC -CGCCTGCCTGG 

10 20 30 40 

60 70 80 90 100 110 

. CGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGCATGGGTGGCC 

• * * • • ■ • • • ■ * • ............ 

■ • • • ... «•» * •••••ssssi 

TGAAAGGAACCG — CAGCACACAGGGTGGGAGGGCTTCCG — ATTTTAGCA-GGGCGGCT 

50 60 70 80 90 100 

120 130 140 150 160 170 

CCCGGGGCGCGG -GCTGGGTGGCGGCGCG -CCTGCTGCTCGGCGCGGGCGCCTGC - -TAC 
::::».::::::: . . : : . : : : : : : . , : 

TCCGGAAGGCGGAGCTC — CAACCCCATTTCCT — TTCTCTGGGCTGGTTCTGGCCCAGC 
110 120 130 140 150 160 

180 190 200 210 220 230 

TGCATTTACAGGCTGACCCGGGGTCGGCGGCGGGGCG ACCGCGAGCItrCGGATACGCT - C 
:::: : - : • : :: :: :::: : : :: ::::: ::. : 

TGCACCTGCGTG - TCGCCCTGGCTCCTCCGCT C - CCTGC - AGCTCCG AGGC AGCAGC 

170 180 190 200 210 

240 250 260 270 280 290 

TTCGAAGTC - CGCAGGTGCCCTGGAAG AAGGGACGTCAGAG - -CGTCAGTTCTCCGCCCC 
: ♦ :::: : :: :::: 

ATGCGTGCCGCGCGCGA - -CGTCGGCTGGGTGGCAGCAGCGCTGGTCCTGGGCGCCGGCG 

220 230 240 250 260 270 

300 310 320 330 340 

CTCGGC - -C CGCCCT -CAGACCGCAGGTACCTGGGAGTCACAGTG -CTCCAAG - A 

* * "• • • a • • * • * * a a a • • a a * * « J. a " » t a a a a a a a a 

C - CTGCTACTGTATCTACCGGCTC ACTCGGGG - ACCCCGGCGAGGCGTCGCGACC.VTGCG 
230 290 300 310 320 330 

350 360 370 380 390 

CC - -TCCCAG -CC- -TGAAGACTTAACTCATGGTTCATATGATGATGTTCTAAATGCTGA 
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CCCTTCGCGATCCGCAG^GACCTAACCGATGGCTCCTATGACGATATCTTAAATGCAGA 
340 350 360 370 380 390 

400 410 420 430 440 450 

. ACAACTTCAGAAACTCCTTTACCTGCT^ 

« Z •••• •••••• 

GCAGCTT AAGAAACTTCTGTATCTGCTGGAGTCAACCGACGATCCTGTCA 

400 410 420 430 440 450 

460 470 480 490 500 510 

. AGCTTTGATTACITTGGGTAACAATGCA^ 

. ; j j j z ♦ i z z •••«!>•• ••••••••••• •« • ••••■■•>«•• 

GGCCTTGGTCACCTTGGGAAATAATCCAGCCTTCTCCACTAACCAGGC 

460 470 480 490 500 510 

520 530 540 550 560 570 

Al'TGGGTGGTATTCC^TTGTTGCAAACAAAATC AACCATTCC — AACCAGAGTATTAAA 

GTTGGGTGGTATCCCAATTGTTGGAAACAAAATCAAC — TCCCTGAACCAAAGTATTAAA 
520 530 540 550 560 

580 590 600 610 620 630 

GAGAAAGCTTTAAATGCACTAAATAACCTGAGTGTGAATGTTGAAAATCAAA 

..-aaaaaaaaaaaasaaaa .»•»•»••••••••»•••••••••»•••••• a » a • • • 

;;j;s:i. •••••• 

GAGAAAGCTTTAAATGCACTGAATAACCTGAGTGTGAATGTTC 
570 580 590 600 610 620 

640 650 660 670 680 

AAGATATACATCAGTCAAGTATGTGAGGATGTCTTCTCTGGTCCTC 
•• •••••• «•••***: • • • • • • • V „ 

AAGATATACGTCCCTCAAGTCTGTGAGGACGTCTTTGCTGAC 
630 640 650 660 670 

700 710 720 730 740 750 

C ACCTCGCTGG ACTGACATTGTTG AC AAACATG AC TGTTACC AATG AC C ACCAGC ACATG 



690 
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T182 . hum. pep MNMTXJAKVLVAAWGLVAVIJ: 

T1S2 .mus. pep MTOflXJAHL^VAAVVGLVAIIXY^ 

T131.hum.pep MAQU^WAVASSFFCASI^SPjVH^ 

T181.mus.pep MAQI£AWAVASSFFT1*SI^^ 



T132 . hum. pep TLQTDEVKNVPCGTSGGVTilYi 

T182 . mus . pep TLQTDEVKM/PCGTSGGVMI^ 

T18 1 . hum . pep TLtfIT)EVKMVKXTSC^^ 

T181.mus.pep TLQTDEVKNVPCGT5GGVTOIYFDK 



T132 .hum. pep HTLQE\T/IELFI^IDEtJLKQAJ^KDLNLMAPGLTI 

T132 .mus.pep HTI£EVYIELFIX2IDENLKQAIX^ 

T181 . hum. pep hTI^EVYIELFIXJIDeOC^^ 

T131 . mus . pep HT^EVYIELFT^IDOIIiC^^ 



T132 . hum. pep XQKQKVVEKEAETERKKWIEAEK^ 

T132 . mus . pep AQKQKVVEKEA £TER KRAVIEA£KIAQ\AKI^ 

Tl 3 1 . hum . pep AQKQKWE3CEAETE3^^IEAEKVAQV^ 

T18 1 . mus . pep AQKQKVVEKEAETERKKAL I EAEKVAQVAEITYGQKV'MEKETEK . . . 



T182 . hum.pep Y.VtfiKYATSNKHKLTPEYLI^KKY^^ 

T132 .mus . pep YAAHKYATSNKHKLTPEYL^^ 

T181 . hum. pep YTAMKIA£ANKIJsLTPEY^^ PNMFMDSAGSV SKQFEGLADK 

OI2C1 . a \TCAQKQADSNKIIXTKEYLELQKII^IASNNKI YYGDS I PQAFV — i % EGTTQQTV 



T132. hum . pep EALEPSCENVTQ- -NKESTC 
T132 .mus . pep EAREPSGESP IQ — NKENAG 
Tl 3 1 . hum . pep LSFCLE -DEPLETATKEN 
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10 20 30 40 50 60 

inputs MATLWC«LIJUX3SI^LSCIJU^ 

: ;* i ♦ . . • : s : * * • s • * 

MK LLSLVAW- -GCIi LVPPAEANKSSEDIRCXCICPPYRNISGHIYNQN 

10 20 30 40 

70 80 90 100 110 120 

inputs ISQKDCDCIJnnrePMPVRGPDVEAYCIJ^^ 

: ::::::: 

VSQKDCNCLHVVEPMPVPGHDVEAYCIXCECR YEERSTTTIKTO IVXYLS WGALIiLYMA 
50 60 70 80 90 100 

130 140 150 160 170 180 

inputs YLTLVEPILKRRLFGHAQLIQSDDDICTHQPFANA 

.: : : : . . . . : • 5 : :::::: 

FLMLVDP - LIRKPDAYTEQIJ^NEEENEDARSMAAAAASLGGPRA-^^^VLERVEGAQQRWK 
110 120 130 140 150 160 

190 

inputs LQVQEQRKSVFDRHWLSN 

• ■ • • • • ••• 

■ • » • • 

LQVQEQRKTVFDRHKMLSN 

170 180 



FIG. 43 
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10 20 30 40 50 60 

inputs MASLW03JIJ-RLGSGI£^CIJIL^^ 

; , » • : . . .:: 2 . • . : . : . ::.::5is!5S5-- 

M KLLCLVAW- -GCL LVPPAQANKSSEDIRCKCICPPYWIISGHIYNQ 

10 20 30 40 

70 80 90 100 110 120 

inputs NISQKDCDCLHVVEPMPVRGPDVEAYCLRCECKy 

• •••••••••• • ••••••• ••••• »•»• J»SS55»»J • j • j ; 

WSQKDCNCIjnA^PMPVPGHDVEAYCLLCECR YEERSTTTIB^I IV! YLSWGALLLYM 
50 60 70 80 90 100 

130 140 ISO 160 170 180 

inputs VYLTLVEPI liCRRLFGHSQLLQSDDDVGDTO 

• • s : « • s : r • . • •• s • s • s • 

AFLMLVDP - LIRKPDAYTEQLHNEEENEDARTMATAAAS IGGPRA - NTVLBRVEGAQQRW 
110 120 130 .140 150 160 

190 200 
inputs KLQVQEQRKSVFDRHWLSN 

! ! S ■ ! • t • » ■ « i # § • • • • • 

SCLQVQEQRKTVFDRHKMLSH 
170 180 
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Input file T187human1; Output file Tl87human1.pat 
Sequence length 2490 

CCACGCGTCt£(XCACGCCCC(UaC^ 79 

TTGttCGCCCaKCCTCTCCGCCTCCGCCGCACOT isa 

CTCGCCT(HSGAGAAGCCCCCCGG ACCC6 CCGG CC TGGACTGCCCGGTTAW 237 

CCCACCTCACACCCCATTTCCTTTCTCCACATCCACGTCACCTCGCGTTTGCTGTGGCCGCTAGGCCCGCCTCCGCTCG 316 

H G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT 391 

CPRGAGVVAAGlllGAGACY 22 

GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

C!YR LTRGRRRGDRELG IRS 42 

TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 



SKSA GALEEGTSEGOLCGRS 62 
TCG AAG TCC CCA GGT GCC CTG GAA GAA GGG ACG TCA GAG GGT CAN TTG TGC GGG CGC TCG 571 



ARPOTGGTWESOUSKTSxPE 82 
GCC CGG CCT CAG ACN GGA GGT ACC TGG GAG TCA CAG TGG TCC AAG ACC TCG CAM CCT GAA 631 

DLTDGSYOOVlNAEQLOICll 102 
GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT 691 

YLLESTEOPV! IERALITLG 122 
TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT 751 

NMAAFSVNQA! IRELGGtPI 142 
AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT 811 

VAMKI MHSUOSIJCEJCALNAL 162 
GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA GCT TTA AAT CCA CTA 871 

MNISVNVENOIKIKVOVLKL 182 
AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG GTG CAA GTT TTG AAA CTG 931 

LLMLSENPAHTEGLLRAQVD 202 
CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG ACA GAA GGA CTT CTC CGT GCC CAA GTG GAT 991 

SSFLSLYDSHVAK6 I LI RVL222 
TCA TCA TTC CTT TYC CTT TAT GAC ACC CAC GTA GCA AAG GAC ATT CTT CTT CCA CTA CTT 1051 

TLFON I ICNCLK IEGHIAVQP 242 
ACG CTA TTT CAG AAT ATA AAG AAC TCC CTC AAA ATA CAA GGC CAT TTA GCT GTG CAG CCT 1111 

TFTEGSl FFLLHGEECAOXI 262 
ACT TTC ACT GAA GGT TCA TTG TTT TTC CTG TTA CAT GGA GAA GAA TGT CCC CAG AAA ATA 1171 

RAlVOHHDAEVKEtfVVT! IP 282 
AGA GCT TTA GTT GAT CAC CAT GAT GCA GAG GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC 1231 

* I * 285 
AAA ATC TGA 1240 

TTGGTCATArTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1319 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1398 

ACTATTTTGATGCCAACTGAArATAAGAGCTTCTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1477 

GTTATCTTCCCTACATGAAGTGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAAGTGATTTGCAGTT 1556 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1635 



ATCCTAAGCTCTTGAGGCCATTCACCTCCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTACAATCTAr 1714 
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TTTC6TCACTTCTACTCAATCAAAAA TCT AAACTTTTAGGAGAGAATGTTTCCTAGGACTCACCCACTCCArTCAATGT 1793 
TACATATAAAArAGTGTMTCMTCACAATGTCCATCmA 1872 
CCCTtOTGGGCCCOTCOTCTTCCCTGTAATC^^ 195 1 

GTTTGACACCAAGCCTGACCAATATGGAGAMCCCTCTCTCTACTAAGA^ 2030 

GCCTGTAATCCCAGCTACTTGGGAGGCCCAGGCAGGAGAATTGCT^ 2109 

ATAGCtXEATTGCACTCCAGCCTG&XAACA^ 2188 

TGTGCrTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTT^^ 2267 

TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2346 

CTACAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTAC^ 2425 

AATATGAGCCCAAATTGTArAATCTTTTTTTAATAAAGGCGAGAAAAATCAAAAAAAAAAAAAAA 2490 
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Cotanlnpur file T187hunan23; Output file T187hunan23.pat 
Sequence length 2595 

TTGCCMCCCCaXCTaCCOTTCCGGCCCACCCTCCGACCaaXCaCCCCCT 155 
CT<ECCTGCCAGAAGCCGCCGGCACC(XCCGGGCTC(UGTGGC 237 
CCCAttTCAGAOXCATTTCCTTTaC^ 316 

M G 2 

AGACCTCCCC^TCCCCCCCCCCAGCCTCCTGC(XTCCCCCC(XCCTGCCGCTa ATG GGT 391 

GPRGAGUVAAGILLCAGACY 22 
GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTC CTG at GGC GCG GGC GCC TGC TAG 451 

CIYRLTRGRRRGORELG I R S 42 
TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CCC GAG CTC GGG ATA CGC TCT 511 

5XSAE0LTDGSYDDVLMA E0 62 
TCG AAG TCC CCA GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

LQKLLYILESTEDPVI I ERA 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT 631 

tl TIGNHAAFSVNQIPMIC LV 102 
TTG ATT ACT TTC GGT AAC AAT GCA CCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC 691 

T 6 ! T F A I tRELGGIPIVAMK 122 
ACT GGC ATC ACA TTC GCT ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA 751 

INMSNOS I KEKALMALNNLS 142 
ATC AAC CAT TCC AAC CAG AGT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG ACT 811 

VNVEKOIK1KIYISQVCEOV 162 
GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG ATA TAC ATC AGT CAA GTA TGT GAG GAT GTC 871 

FSGPLMSAVOLAGLTILTWH182 
TTC TCT GGT CCT CTG AAC TCT GCT GTG CAG CTG GCT GGA CTG ACA TTG TTG ACA AAC ATG 931 

TVTMDHQHNLHSYITDL F 0 V 202 
ACT GTT ACC AAT GAC CAC CAG CAC ATG CTT CAC AGT TAC ATT ACA GAC CTG TTC CAG GTG 991 



CLTGNGNTICVOVLKLLLNL S 222 
KTA CTT ACT GGA AAT GGA AAC ACG AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT 1051 



EMPANTEGLLRAOVDSSFIS 242 
GAA AAT CCA GCC ATG ACA GAA GGA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT TYC 1111 

LYOSKVAKEI LLRVLTLfON 262 
CTT TAT GAC AGC CAC GTA GCA AAG GAG ATT CTT CTT CCA CTA CTT ACG CTA TTT CAG AAT 1171 

I KNCLIC I EGH IAVQPTF T EG 282 
ATA AAG AAC TCC CTC AAA ATA GAA GGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT 1231 

SLFFlLHGEECAQJCIRALVD 302 

TCA TTG TTT TTC CTG TTA CAT GGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT 1291 

HHOAEVKEXVVTIIPKI* 320 

CAC CAT GAT GCA CAG GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TGA 1345 

TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1424 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1503 

ACTATTTtGArGCCAAGTGAATATAACAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1582 

GT TA TCT TCCCTACA rCAAGTCGCAGTAACCT TTT TCACA TT TAAGCTACCCTTCTACCTT TTGAAGTGATT TGCAGTT 1661 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATCAA 1740 

ATCCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAr 1819 



rki. 4 7 
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TTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAATGTTTCCT^ 1898 
TACATATAAAATACTCTGATCAATCACAATGTCCATCTTTAGACACTTGGrrAAATAAATTATCTCGTCTTTGA 1977 

CCGTGCTGGGCGCGCTGGCTCTTGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 2056 

GTTTGAGACCAAGCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAT 2135 

GCCTGTAATCCCAGCTACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGGGAGGCACAGGTTGCAGTGAGGTGAG 22 U 

ATAGCGCCATTGCACTCCAGCCTGGCCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGATGGAGCTCCG^ 2293 

TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2372 

TGTGTGTGT6TGTGTGTGTG7GTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2451 

CTAGAATCATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCnCTACGTACACACT 2530 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2595 
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Input file T187huaan123; Output File T187hunan123.pat 
Sequence Length 2700 

CCACCCGTCCGCCCAGGGGCGGCAGGCACCM^ 79 

TTGCGCCCCCCCCCGTaCCGCOTGCGGCCCACCGT 158 

CTCGCCTCGGAGAAGCCCCCGGGACGCGCCCGGCTGGACTGGGCGGTTATAGGCTTTW 237 

CCGAGCTCAGACCCCATTTCCmCTCCACATCCACGTCAGGTGGCGTTTGCTGT 316 

M G 2 

AGACCTCCCCCCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCCGCGCCAGC AfG GGT 391 

GPRGAGWVAAGtLLGAGACY 22 

GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

CIVRlTRGRRRG ORElCtRS 42 

TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 



SKSAGALEEGTSEGOLCGRS 62 
TCG AAG TCC GCA GGT GCC CTG GAA GAA GGG ACG TCA GAG GGT CAN TTG TGC GGG CGC TCG 571 



ARPOTCGTUESOUSKTSOPE 82 
GCC CGG CCT CAG ACN GCA GGT ACC TGG GAG TCA CAG TGG TCC AAG ACC TCG CAM CCT GAA 631 

DLTDGSYDOVLNAEOLQKLL 102 
GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT CCT GAA CAA CTT CAG AAA CTC CTT 691 

YLLE STEOPVI I E R A I I TLG 122 
TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT 751 

N N A A FSVN Q t PNKLVTG I TF 142 
AAC AAT GCA GCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG CTC ACT GGC ATC ACA TTC 811 

AlIRELGGIPtVAHICIttHSM 162 
GCT ATT ATT CCT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC 871 

OS I JCEKALMALNULSVMVEN 182 
CAG ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT 931 

QIICIJCIYISOVCEDVFSGPI 202 
CAA ATC AAG ATA AAG ATA TAC ATC AGT CAA GTA TGT GAG GAT GTC TTC TCT GGT CCT CTG 991 

MSAVOLAGLTLLTMMTVTWO 222 
AAC TCT GCT GTG CAG CTG GCT GGA CTG ACA TTG TTG ACA AAC ATG ACT GTT ACC AAT GAC 1051 

HQHMLHSYI TOLFOVLL TGM 242 
CAC CAG CAC ATG CTT CAC AGT TAC ATT ACA GAC CTG TTC CAG GTG KTA CTT ACT GGA AAT 1111 

GNTKVQVLKLILNISENPAM 262 
GGA AAC ACG AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG 1171 

TECLIRAOVDSSFISIYOSH 282 
ACA GAA GGA CTT CTC CCT GCC CAA GTG GAT TCA TCA TTC CTT TTC CTT TAT GAC ACC CAC 1231 

VAKE I ILRVLTLFQNIJCNCL 302 
GTA GCA AAG GAG ATT CTT CTT CGA GTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TCC CTC 1291 

KIEGMIAVOPTFTEGSIFFI 322 
AAA ATA GAA CGC CAT TTA CCT CTG CAG CCT ACT TTC ACT GAA GCT TCA TTG TTT TTC CTG 1351 

LHGEECAQK [RAIVDHHOAE 342 
TTA CAT GGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA GAG 1411 

VKEKVVTIIPKI* 355 
GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TGA 1450 

TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1529 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1608 

ACTATTTTGATGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAArGCTT 1687 
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CTTATCTTCCCTACAT6AA6TCCCACTAACCTTTTTCACATTTMCCTACCCTTCTACCTTTTCAACTGATTTCCACTT 1766 
ACTCATCTGAGACAGCATCACTAmCACTAMTCATTGTnC^^ 1845 
ATCCTAACCTCTTCACCCCATTCACCTCCCAACCTGACCATACTCCTTTCAAAACTCTTTTCTCATCACTAGAA 1924 
TTTCCTCACTTCTACTCAA TCAAAAA TGTAAACTTTTACGACACAATGTTT CCTACGACT CACCCACT CCATTCAATGT 2003 
TACATATAAAATAGTCTGATCAATCACAATGTCCATCTTTAGACAGTTCCTTAAATAAATTATCTCCTCTT7GAAAAGA 2082 
CCGTGCTGGGCGCGGTGGCTCTTGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 2161 
GTTTGAGACCAAGCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAT 2240 
GCCTGTAA TCCCAGCT ACTTGGGAGGCCGAGGCAGGAGAA T TGCT T GAACCCGGGAGGCAGAGGTTGCA GT GAGGTGAG 231 9 
ATAGCCCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGATGGAGCTCCGAA 2398 
TGTGCTTAAGTGCAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2477 
TGTGTGTGTGTGTGTCTGTGTGTGTGTGTG rGTGTGTGTGTGTTTGAA TGAAAAATGCTTATGTATTGACAGAACA CTT 2556 
CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACG7ACACACTGTTCTACTGTTTGAATTTTTT 2635 
AATATGAGCCCAAATTGTATAATCTTTTTTTAArAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2700 
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input file T187human12; Output File T187human12.pat 
Sequence length 2523 

(XACGCCTCCGGCCAGGGCCCGGAK^ 79 

TTGCCGCCCCCCGCGTCTCCCCGTGGCGCGCACCCTCCGACCCGCCCCTCCCGCTGTCCAGCGCCCCCCACCCCCCCCC 158 

CTCGCCTGGGAGAAGCCGCCGGGACGCGCCGGGCTGGAGTGGGCGGTTATAGGCTTTGAGCTAGGCCGTTTCCGGGAGC 237 

CGGAGCTCA GACCCCATT TCCTTTCT CCACAT CCAGGTCAGGTGGCCTTTGCTGT GGCGGCTAGGCCCGCGTGCGCTGG 316 

M G 2 

AGACCTCCGCGCTGGCCCCCCCGAGCCTCCTGCGCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT 391 

GPRGAGWVAAGLl L G A G A C Y 22 

GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

CIYRLTRGRRRGORELGIRS 42 

TGC ATT TAC ACG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 



SKSAGAIEEGTSEGOLCGRS 62 
TCG AAG TCC CCA GGT GCC CTG GAA GAA GGG ACG TCA GAG GGT CAN TTG TGC GGG CGC TCG 571 



ARPQTGGTV ESOUSJCTSxPE 62 

GCC CGG CCT CAG ACN GGA GGT ACC TGG GAG TCA CAG TGG TCC AAG ACC TCG CAN CCT GAA 631 

DLTDGSYDDVINAEOLOJCLL 102 

GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT 691 

YLLES'TEDPV! I E R A L I TIG 122 

TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT 751 

NNAAFSVNOt PMKLVTGI TF 142 

AAC AAT CCA GCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC ACT GGC ATC ACA TTC 811 

AI IRELGGIPIVANKINHSN 162 

GCT ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC 871 

QSIJCEKALNALMNLSVNVEN 182 

CAG AGT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG ACT GTG AAT GTT GAA AAT 931 



OtKIKVOVLKLLLNLSENPA 202 
CAA ATC AAG ATA AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA CCC 991 



NTEGILRAOVOSSFLSLYDS 222 
ATG ACA GAA GGA CTT CTC CGT GCC CAA GTC CAT TCA TCA TTC CTT TYC CTT TAT GAC AGC 1051 

HVAKE IlLRVLTLFQNflCMC 242 
CAC GTA GCA AAG GAG ATT CTT CTT CCA GTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TGC 1111 

LKIEGNLAVO PTFTEGSL FF262 
CTC AAA ATA GAA GGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC 1171 

LlHGEECAOJCt RALVDNMDA 282 
CTG TTA CAT GGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA 1231 

EVKEKVVTIIPKI* 296 
GAG GTG AAG GAA AAG CTT CTA ACA ATA ATA CCC AAA ATC TGA 1273 

TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATCTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1352 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTCCCGTGGTTCTCAGATTTATTTTGG 1431 

ACTATTTTGATGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1510 

GTTATCTTCCCTACATGAAGTGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAAGTGATTTGCAGTT 1589 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1668 

ATCCTAAGCTCTTCAGGCCArTCACCTGCCAACCTGACCATACTCCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1747 
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TTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAATGT^ 1826 
TACATATAAAATAGTCTGATCAATCACAATGTCCATCTTTAGAGiCTTCGTTAAATAAATTATCTCGTCTTTGAAAACA 1905 

CCGTCCTGGGCGCGGTGGCTCTTGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 1984 

GTTTGAGACCAAGCCTGACCAAT'ATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAT 2063 

GCCTGTAATCCCAGCTACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGGTGAG 2142 

ATAGGGCCATTGCACrrCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTGUAAAAAAAAAAAAATGATGGAGCTCCGAA 2221 

TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATA6AATATGGGATCCCGTGTG 2300 

TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2379 

CTAGAATGATACCt^UVACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2458 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2523 
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Input file T187hu«an2; Output File Thvraan2.pat 
Sequence length 2418 

CCACCCCTCCCCCCACCOXCaSACaW^ 79 
TTCCCCGCCCCCttCTCTCC^ 158 

CCCAGCTCACACCCCATTTCCTTTCrCCACATCCACGTCACGTGGCCTTTGCTGTCCCCCCTACGCC^ 316 

N G 2 

AGACCTCCGCCCTGGCCCCCGCGAGCCTCCTCCCCTGGCCCGGCCCTGCGGCTCTK^ ATG GGT 391 

CPRGAGWVAAGLLLGAGACY 22 
GGC CCC CGG GCC GCG CGC TGG GTG GCG GCG GCC CTG CTG CTC CGC GCC GCC GCC TGC TAC 451 

rtyR LTRGRR RGORELGIRS 42 
TGC ATT TAC ACG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GCG ATA CGC TCT 511 

srSAEOLTDGSYOOVLMAEO 62 
TCG AAC TCC GCA GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

■ OjCtlYtlESTEDPVllERA 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT (XT 6TA ATT ATT GAA AGA GCT 631 

LITLGHNAAFSVMOIPHICIV102 
TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC 691 , 

TGITFAI IREL6GIPIVANK 122 
ACT GGC ATC ACA TTC GCT ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT GTT CCA AAC AAA 751 

t u H S N 0 S I K E K A I W A L N N I S 142 
ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG ACT 811 

VMVEMQIKIKV 0VLKLLLNL162 
GTG AAT CTT GAA AAT CAA ATC AAG ATA AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG 871 

ecMpAMTEGLLRAQVDSSFL 182 
NCT GAA AAT CCA GCC ATG ACA GAA GCA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT 931 

SLYOSHVAKE1LLRVLTLFQ202 
TYC CTT TAT GAC AGC CAC GTA GCA AAG GAG ATT CTT CTT CCA GTA CTT ACG CTA TTT CAG 991 

M1ICMCLICIEGHLAVQPTFTE222 
AAT ATA AAS AAC TGC CTC AAA ATA GAA GCC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA 1051 

GSIFFLLHGEECAOKIRALV242 
GGT TCA TTG TTT TTC CTG TTA CAT GCA GAA GAA TGT CCC CAG AAA ATA AGA CCT TTA GTT 1111 

DHNDAEVICEICVVTIIPIC!* 261 
GAT CAC CAT GAT GCA GAG GTG AAG GAA AAC CTT GTA ACA ATA ATA CCC AAA ATC TGA 1168 

TTCGTCATATTTTTCCAAAGAGTAATGCAGTCTG6ATATAAATGTATTTTCTCTCTTCCTTATAAGCGGATTCTCCCAG 1247 

CT6CTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1326 

ACTATTTTGATGCCAAGTGAATATAACAGCTTGTACTCAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1405 

GTTATCTTCCCTACATGAACTCGCAGTAACCTTTTTCACATTTAACCTACCCTTCTACCTTTTCAAGTGATTTGCAGTT H84 

ACTCATCT6AGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1563 

ATCCTAAGCTCTTGAGCCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1642 

rTTCCTCACTTCTAGTCAATCAAAAATGTAAACTTTTAGGAGAGAATGTTTCCTAGGACTCACCCACTCCATTCAATGT 1721 

TACATATAAAATAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAACA 1800 

CCGTGCTGGCCGCGGTGCCTCTTGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 1879 

GTTTGAGACCAAGCCTGACCAATATGCAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGCCATGGTGGTGCAT 1958 
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CCCTCTAATCCCAGCTACnCGGAGGO^ 2037 

ATAGCGCCAntaACTCCAGCCTGGGCAACAAGAGCAAAAaCT^ 2116 

TGTGCTTAAGTGGAAACATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATT^ 2195 
TCTCT6T6T6TCTGTCTCTCTCTCTCTGTCTCTCTCTCTCTCTTTCAATGAAAAATCCTTATCTATTCACACAACACTT 2274 

CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACT^ 2353 

AATATGACCCCAAATTGTATAATCTTTTTTTAATAAAGCGCACAAAAATCAAAAAAAAAAAAAAA 2418 
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Input file T187huBan3; Output File T187hunan3.pat 
Sequence length 2562 

nGCGCGCCCCGCCGTCTCCCCGTGGCGCGCACOT 158 
CTCGCCTGGGAGAAGCC(XCG<^CGCGCCGGGCTG(^TGGGCGGTTATA(^T^ GAGCTAGGCCGTTTCCGGGAGG 237 
CGGAGCTCAGACCCCATTTCCTTTCTCCACAT(XAQrrCAGGTGGCGTTTGCTGTGGCK 3 16 

N G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTCCGGCTCTGCCGCGGCGCCAG ATG GGT 391 

GPRGAGUVAAGL LLGAGACY 22 
GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAG 451 

CI YRLTRGRRRGDRELG IRS 42 
TGC ATT TAC AGG CTG ACC CCC GGT CGG CGG CGG GCC CAC CGC GAG CTC GGC ATA CCC TCT 511 

SKSAEOLTDGSYODVLMAEQ 62 
TCG AAC TCC GCA GAA 6AC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

LQJCLLYLL6STEDPVI I. ERA 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT 631 

L IT LGMNAAFSVNOAI IREL102 
TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT ATT ATT CGT GAA TTG 691 

GGIPIVANICINHSNQSIICEJC122 
CGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA 751 

ALMALNVLSVNVENQIKIKI 142 
GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG ATA 811 

Y I SOVCEDVFSGPtNSAVOL 162 
TAC ATC ACT CAA GTA TGT GAG GAT GTC TTC TCT GCT CCT CTG AAC TCT GCT GTG CAG CTG 871 

AGLTLLTMHTVTMDHOHMLH 182 
CCT GGA CTC ACA TTC TTC ACA AAC ATG ACT GTT ACC AAT GAC CAC CAG CAC ATG CTT CAC 931 



SYI T 0 L F □ V l I T G MGM T KVQ 202 
AGT TAC ATT ACA GAC CTG TTC CAG GTG KTA CTT ACT GGA AAT GGA AAC ACG AAG GTG CAA 991 



V LJCl LLNLSEMPAHTEGLLR 222 
CTT TTC AAA CTG CTT TTG AAT TTG WCT GAA AAT CCA GCC ATG ACA GAA GGA CTT CTC CGT 1051 



A 0 V D SSFISIYDSHVAKE I I 242 
GCC CAA GTG GAT TCA TCA TTC CTT TYC CTT TAT GAC AGC CAC GTA GCA AAG GAG ATT CTT 1111 

LRVLTIFOMIKNCLKI EGHL 262 

CTT CCA GTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TGC CTC AAA ATA GAA GGC CAT TTA 1171 

AVQPTFTEGSLFFLLHGEEC 282 

CCT CTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC CTG TTA CAT GGA GAA CAA TGT 1231 

A Q K I RAIVDHHDAEVKEICVV 302 

CCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT CAT GCA GAG GTG AAG GAA AAG GTT GTA 1291 

T I I P K I • 309 

ACA ATA ATA CCC AAA ATC TCA 1312 

TTGGTCATATTTTTCCAAA(UGTAATGCACTCTGCATATAAATGTATTTTCTGTCTTCCTTATAAGGCCATTCTCCCAG 1391 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCCTGGTTCTCAGATTTATTTTGG U70 

ACTATTTTGATGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1549 

GTTATCTTCCCTACATGAA6TGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTCAAGTGATTTGCAGTT 1628 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATACTCTTGTTCTTTTAGTAGCAATGAA 1707 
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ATCCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTGCTnCAAAAG 1786 

TTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAATGTTTC^ 1865 

TACATATAAMTAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 19U 

CCGTGCTGGGCGCGGTGGCTCTTGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 2023 

GTTTGAGACCAAGCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTG6GCAT6GTGGTGCAT 2102 

GCCTGTAATCCCAGC7ACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGCGAGGCAGAGGTTGCAGTGAGGTGAG 2181 

ATAGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGA7GGAGCTCCGAA 2260 

TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2339 

TGTGTGT3TG7CTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2418 . 

CrAGAATGATACCCAAACTCCTuGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2497 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAG6GGAGAAAAATCAAAAAAAAAAAAAAA 2562 
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Input file T187hunan; Output File T187huaan.pat 
Sequence length 2385 

CCACGCCTCCGCCCAGGOnnEGAC^ 79 

nccacccccccccTCTCCtxcraaaccACCCTccGACcaxcOT 158 

CTCCCCTCWSAGAAGCCCCCCCGACGCGCOICGCTCGAGTGCGCCGTTATACCCTTTCAGCTAGCC 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACATCCAGGTCAGGTGGCGTTTGCTGTGG^ 316 

M C 2 

AMCCTCCCCCCTGCCCCCCCCGAGCCTCCT^ ATG CGT 391 

GPRGAGWVAAGlllCAGACY 22 

GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAG 451 

CIYRLTRGRRRGDRELGIRS 42 

TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CCC GAG CTC GCG ATA CGC TCT 511 

SKSAEOLTOGSYDOV-LMAEQ 62 

TCG AAG TCC GCA CAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

COKLIYLIESTEDPVI IERA 82 

CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACC GAG GAT CCT GTA ATT ATT CAA AGA GCT 631 

11 TLGMMAAFSVMOA I J R E L 102 

TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT ATT ATT CGT GAA TTG 691 

G GIP IVANK!MHSNQS!ICEJC122 

GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA 751 

ALNALNNISVNVENQ t K I K V 142 

GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG GTG 811 



OVLJCLLLMLSEMPAMTEGLL 162 
CAA GTT TTG AAA CTG CTT TTG AAT TTG MCT GAA AAT CCA GCC ATG ACA GAA 66A CTT CTC 871 



RAOVOSSFISLYDSHVAICE I 182 
CGT GCC CAA GTG CAT TCA TCA TTC CTT TYC CTT TAT GAC AGC CAC GTA GCA AAG GAG ATT 931 

LLRVLTLFOHIKMCLK! EGK 202 
CTT CTT CGA GTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TGC CTC AAA ATA GAA GGC CAT 991 

LAVOPTFTEGSLFFLLHGEE222 
TTA CCT GTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC CTG TTA CAT CGA GAA GAA 1051 

CAOK I RALVDHROAEVKEKV 242 
TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA CAG GTG AAG GAA AAG GTT 1111 

V I I I P K I * 25° 
GTA ACA ATA ATA CCC AAA ATC TCA 1135 

TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1214 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1293 

ACTATTTTGATGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1372 

GTTATCTTCCCTACATGAAGTGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAAGrGATTTGCAGTT 1451 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1530 

A TCCTAAGC TCT T GAGGCCA T T CACCT GCCAACC TGACCA TACTGC TT TCAAAAG TC T T T TC TCA T CAGT ACAA T C T A T 1609 

TTrGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAATGTTTCCTAGGACTCACCCACTCCATTCAATGT 1688 

TACATATAAAATAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 1767 

CCGTGCTGGGCCCGGTGCCTCTT GCCT G T AA TCCCAGCACT T TGGGAGGC T GAG CCGGGCAGA TCACCTGAGATCGGGA 1846 

GTTTGAGACCAAGCCrGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAT 1925 
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GCCTGTAATCCCAGCTACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACC^ 2004 

ATAGCCCCAnCCACTCCACCCT(^XAAW(^CCAAAACTCTCT 2083 

TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTnTAAAACA 2162 
TCTCTGTGTCTCTCTGTGTCTCTCTCTGTCTGTCTCTCTCTCTTTt^TGAAAMTCCTTATCTATTGACACAACACT^ 2241 

CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACA 2320 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2385 



WO 00/18904 
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Input file T181Atn*181a; Output File T181Att 
Sequence length 3919 

GGGGTGTGGCGGTTTCTACGGTTGCACGGCGGTTCGGCT GTGT/ 

MAOLCAVVAV 
ACTG ATC GCT CAG TTG GGA GCT CTT GTG GCC GTC 

LFSAVHKIEEG 
CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA 

ALLTSTSGP6F 
GCC CTG CTC ACC TCC ACC AGT CGC CCG GGT TTC ■ 

YKSVOTTIOTO 
TAT AAG TCT GTA CAG ACC ACT CTC CAA ACT GAT < 

SGGVMIYFOftI 
AGT GGT GGT GTG ATG ATC TAC TTT GAC AGA ATT t 

AVYDIVKMYTA 
GCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCA C 

KIHHELHOFCS 
AAG ATC CAT CAT GAG CTT AAC CAC TTC TGC ACC G 

ELFOOIDEHLK 
GAG CTG TTT GAT CAA ATT GAT GAA AAC CTC AAC T 

MAPGLVIOAVR 

ATG GCC CCT GGG CTC GTT ATC CAA GCT GTG CGA G 

IRRMYELMESEI 
ATC CGC AGG AAC TAT GAG CTG ATG GAA AGC GAG to 

ICOKVVEICEAETI 
AAG CAG AAG GTG GTG GAA AAG GAG GCA GAA ACA Q 

EKVAOVAE I TYC 
GAA AAA GTG GCA CAG GTT GCA CAA ATC ACC TAT GC 

EKKISElEOAAf 
GAG AAG AAG ATC TCA GAA ATT GAA GAT GCT GCG TT 

DAECYTALK I A E 
GAC GCT GAG TGC TAC ACA GCG CTG AAG ATC CCA GA 

EYLOLKICYICAIA 
GAA TAC CTG CAG CTG ATG AAC TAC AAG GCC ATT CC 

KOIPNMFHOSAG 
AAA GAC ATC CCC AAC ATC TTT ATG GAT TCC CCA GCl 

ISODKL GFGIED 
CTG AGC CAC GAC AAG CTG GCC TTT CGC CTA GAA GA1 

EM* 

GAG AAC TGA 

GGAAACACTGTCTGCAACCTCTGCTCCGGCAGCTTACAGACACCTGT 
TCCTTTCCACACTACCTTCCTTGACTCTTCTTACTGTGGTTAAAAAG 
GAAGGGAGAGCAGATGGACAGTTGTTTTTTGCGTTTATTTTTAATTC 
GTATGCACCGTAGATTTGACCTCTCACCTCCAGACACCAACATTGTC 
ACTATGAAGACGGAGAGTGTGTCCTGCCTCCTCGTGCTTGAATTCCT 
TTCCCTCTAGTGTAGGCAGTGTCTGCGTGTGGCCCTCGTCACAGAAGI 
CGTTCCCCCCCTGGGCTTTTTGACTGAGTGCATTACTTGACAGTTAAi 
TGCTACGTTTTGCAACGTTTTCTACACACTGTACTCTCCTCTAGTGT1 
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tCGCAGCCCCTGGAGGGACAGCCTGGATACAGGTTC 79 

ASSFFCAS 18 
; GCT TCC ACT TTC TTT TGT CCA TCT 137 

HI6VYYRGG 38 
CAT ATT GGA GTA TAT TAC AGA GGT GGT 197 

HLHLPFITS 58 
CAT CTC ATG CTC CCG TTC ATC ACA TCC 257 

EVJCNVPCGT 78 
jAA GTG AAG AAC GTA CCA TGT GGA ACC 317 

EVVMFLVPN 98 
:AA CTG GTG AAC TTC CTG CTC CCA AAT 377 

OYDKALIFN 118 
IAC TAT GAC AAG GCC CTC ATC TTC AAC 437 

VHTLOEVYI 138 
iTT CAT ACT CTT CAG GAA CTC TAT ATC 497 

LALQOOtTS 158 
TG GCT TTG CAG CAG GAC CTG ACT TCC 557 

VTJCPM1PEA 178 
TG ACA AAG CCC AAT ATA CCT GAG GCA 617 

KTXLL I A A 0 198 
KG ACG AAG CTT CTC ATT GCA CCC CAG 677 

: R K K A L ! E A 218 

IG AGG AAG AAG CCC CTC ATT GAG GCA 737 

iQKVMEKET 238 
IG CAA AAG GTG ATG GAG AAG GAG ACA 797 

LAREKAKA 258 

C CTG GCC CCG GAG AAG GCG AAG GCC 857 

AMKLKLTP 278 
A CCA AAT AAG CTC AAG CTC ACT CCA 917 

SNSKtYFG 298 
T TCC AAC ACC AAG ATT TAC TTC GCC 977 

GLGKOFEG 318 
5 GGG CTG GCC AAG CAG TTT GAG GGG 1037 

EPIEAPTK 338 
r GAG CCC CTC GAG GCA CCC ACA AAG 1097 

341 
1106 

'ATTCTTTAAGATCAGACACAGCAAACCGCTCC 1 185 
iGAAGAAATGGACACAAACTTACCCCCTTCTGG 1 264 
AGGTAAGTAACTTGTATGACTTCTGAGAAGGT 1343 
ACTTTGAAGCTGGTTTAAGTGGAGCTACTGTC 1422 
TCAGGGAAAAGTCTACTCCACAGTTCTCTCCC 1501 
CCCGTCTGCTGCGGAACATGAGCTGCAGAGAG 1580 
jCTGTCTTGAGCCCTTTTTAGCAAGAACTTGG 1659 



fTCTTGCCTACATCTCACCGCAGCAGGGCTTG 1738 
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GTCACACCACACACTCCTTTTCCGTACTTTGACCTGATCTGTGATT^ 1017 
CACTCAGCGTTAAGATGGGAACAAACAAGTGCTGTTAGCTGATGACGTAGCTCCTT^^ 1896 
GTGTGCCTAATTArGCGTATGCTTTTGAGACCAAACATCTTTATCATTATCGAGATTC^ 1975 
CTGTGGAGAAGGCCCCAGCCAGATGACACCCAAGTAGTAGTGCCTGTGGC^ 2054 
AAGAGACXAGGCAGCCACTT GAGAGTCGGCTCCAGTGA6TCACCCTAGGAAACTGA6AAT6C6AAGAATAGATATGAGA 2133 
GAAAGGGATTTCTTATCCTCAAATTGCACTGC(^TGGGGCTCrACCAT 2212 
GGAGGGCAGCTCT GCAGGTAATCT6CAGACATGGCAGTACCCT GTGCAACCATGACTGGCTCTAGCTT AGGACTTGGCC 2291 
nGTTAGCTGGTCCCCTACCTCATCCTCCCCCCAWCAAAGCACCTACTGTTCTCTCTTAGGTGACTACTATAAATGGT 2370 
ATTTTCTGCCATCMTTCCCACCTCAGTTTTGGTTTTGTAA6TCGGGCCAGTTTGCT 2449 
AGGTAT7TGGGAAGCATTCAGCCGACCCAAAAAGAGGCAGGGTTCACTGTGCTTACT 2528 
TCUCTCCTCAGCCCACTGACCCTCGCCACACTC™ 2607 
AAGCTCTTGCAAAAGTGGGTTTTTTTTCCCCAAGACGWCTCATCTTCTTCTCATTTGTTGCTGCTAACCACTTCTTGA 2686 
GAGCAACGTGCTATACCCAGCATCCTCTCTTGTACGTGCACCTGAGAAAACACTACTTCAGTGGAGTCCGTGCAGGAGG 2765 
GAG^ACCCCGCCATCCAGCGCCCTCCTAGCCCGAGAGGCTCTGTAACTAGCATTCT 28U 
AAAGAGCCACAGTAAAGTCCTGCTGCAGCTGCTCCTTCCCTGCCCCTTT AATGTCACTTCTTTAACAGAACAGAAATGT 2923 
CCCCATGTCATAGCATAAATTCAGTAGCTATTGGTATCTGTCCCAGCAGTAAAATCATGGAACTCAGATGTCTTTTTAG 3002 
CATGGGATGCCTAGCCCATCTGTCnTATGACCTTGTTTTTTGTMT 3081 
AAACATGTAAAATGTGATAAGCCTGCAGTrTTGTAGGCAGTGAATTCATAGCTCCTATTTTTAAGTAGAACTTCTATCA 3160 
AAATACGTTAACCGTTTGTAAAATTCAGTTTTTGTAGGACTTTCCCAAGGCCCAGCCACCTTGGTAGAATGCTTCTCAC 3239 
TCACTAAATGTTGCAGAAGCAATTTATATTCCATATAGGTTTTTAATCACTTTTCAATATATGGTTAGAATGTTTGTAA 3318 
GGAAGCCTAAGTTTAATAATTTTTATATAACTAAAAATAGGT6TGGAGGACTCAGTGTGGGTACTGAGGAGGAATGAAG 3397 
TGCTCTGAAAAGGGAGGTGTATAAACGCCCTGTGGGGCCGTGTGTCTTGTGAAAGTGACATAGCCGTGCTTACTGACCT 3476 
6GGCTGTC6TCAGCTGGCCGTCGGTAAACTACCTGGACAATAGCCCCTCTGTCTGGGAACTTTACCTACTTCCTTGTCC 3555 
TCAGTGGGCTTCTAGCCACTGTTTGTTTCCTTATAAAAGCTGTAATGGGCAATCATGTGTTTGTACTTCCATTCCTTTT 3634 
TATCTCTACTTCTGTGTAAACTGGTGATTGAATAGTTAAAGCAATTTTTTCAGTGTGCCCCAAGGGCATTAArGAGCCT 3713 
TTATAACTGACAAATGATTCTTGTTATAGTAATTATTCCATAAAT6ATACCACTAGATAAATTACCTTG6GTTAATAGC 3792 
TCCAGGAT TTGTTTCAGACAACAAAAAAAGCTCTCAATCTGAATATACTTACA T T TTGGAT TTAAT T TCAGT CTTGCTA 3871 
AATAAAATGTTTTTGTCTTTTTTTGATTAAGGTAAAAAAAAAAAAAAA 3919 
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Input file T182nouse; Output File naZroouae.pat 
Sequence length 3087 

NNMTQARL 8 
GGAACCCCCCGTCCCCNGATCCCTCAaCACCCGACCAACAACG ATC AAT ATG ACT CAA GCC CGG CTT 68 

LVAAVVCLVAILLYASIHK! 28 
CTG GTG GCT CCA GTG GTG CGG TTG GTG GCG ATC CTC CTG TAC GCC TCC ATC CAC AAG ATC 128 

EEGHIAVYYRGGALLTSPSG 48 
GAA GAG GGA CAC TTG GCC GTG TAC TAC AGG GGA GGA GCT TTG CTA ACG AGC CCC ACT GGA 188 

PGYN IMIPFITTFRSVQTTL 68 
CCA GGC TAT CAT ATC ATG TTG CCT TTC ATT ACA ACA TTC AGA TCT GTG CAG ACA ACA CTA 248 

QTDEVKNVPCGTSGGVHIYI 68 
CAA ACG GAT GAA GTT AAA AAT GTG CCT TGT GGA ACA AGT GGT GGA GTC ATG ATC TAT ATT 308 

OR I EVVMMlAPYAVFD IVRN 108 
GAC CGA ATA GAA GTG GTT AAT ATG TTG GCT CCT TAT CCA GTG TTT GAC An CTG AGG AAC 368 

YTADYOJCTLI FNJCI H H E L M Q 128 
TAT ACT GCA GAC TAC GAC AAG ACT TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG 428 

FCSAHTLOEVYIElFOOIDE 148 
TTT TGC AGT GCC CAC ACA CTT CAA GAA GTT TAC ATA GAA TTG TTT GAT CAA ATA GAT GAA 488 

MiKQAlQKDLMTMAPGLTtQ 168 
AAC CTG AAG CAG GCC CTG CAA AAA GAT TTA AAC ACC ATG GCC CCA GGT CTC ACT ATC CAG 548 

AVRVTKPICIPEA I RRNFELH 188 
GCT GTG CCT GTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AGA AAT TTT GAA TTA ATG 608 

EAEKTKLL lAAQKOKVVEKE 2C8 
GAG GCA GAG AAG ACA AAA CTT CTC ATA GCT GCA CAG AAA CAA AAG GTG GTG GAG AAA GAA 668 

AETE RICRAVI EAE K I AOVAK 228 
GCT GAG ACG GAG AGG AAA AGG GCT GTT ATA GAA GCA GAG AAG ATT GCA CAA GTA GCA AAA 728 

1 RFOOKVMEICETE ICR I SE I E 248 
ATT CGA TTT CAA CAG AAA GTG ATG GAG AAA GAA ACT GAA AAA CGC ATT TCT GAG ATT GAA 788 

DAAF LAREKAKAOAEYYAAH 268 
GAT GCT GCG TTC CTG GCC CGA CAG AAG CCA AAA GCA GAT GCC GAG TAT TAC GCT GCA CAC 848 

KYATSNKHKLTPEYIELICKY 288 
AAA TAC GCC ACC TCA AAC AAG CAC AAA CTG ACC CCA GAG TAT CTG GAG CTC AAG AAA TAC 908 

OAIASNSICIYFGSNIPSNFV 308 
CAG GCC ATT GCC TCA AAC AGT AAG ATC TAC TTT GGC AGC AAC ATC CCC AGC ATG TTT GTG 966 

OSSCAIKYSDGRTGREOSLP 328 
GAC TCC TCC TGT GCT CTG AAA TAC TCT GAT GGT ACG ACT CGG AGA GAA GAC TCC CTT CCC 1028 

PEEAREPSGESPI ONKENAG 346 
CCA GAG GAG GCC CGT GAG CCC TCT GGA GAG AGC CCC ATC CAA AAC AAG GAG AAC GCA GGT 1068 

• 349 
TCA 1091 

TGCMGAGGTGGAAArGTTCTCC^TATCAAGATGCGACCCAAGGGGCTAAGTGGGAACAGTGGTTATGTGGACTCGTA 1 1 70 

AGATTCACAGAGAATGTGTGCTCTGTTGTGATTCTCTTGTCATAGTCCTGGTTTGCCAGCTGACTACAGGATAGACCCA 1249 

GCTGTCTGGCACTCAAACGGTCTCTGCAGCCACAGTTTTATCAAGTATCCTGTATGTGTTCCTTTGTAAACCGGTACTC 1328 

ATGAATGAGGGAAAGTCTGATGCTAAGATACTGCCTGCACTGGAATGTCAAACACTATATAACAAGCTGTGGTTTTTAA 1407 

AAGCTAT T GAA TAATGTTTACAT TGG TCCCTGAGGACA TG TGT GC TCAGACA T T CAAGACCTAGGAGGCCAGAGAGAAG I486 

ACCTTCAGAAAACGGTAAGTTAAAGAAGACAAGTGTCATCAGACACTTGGGACCCGGGCTCTCTTTAAAGTCTAGTCCC 1565 

GGCATTCCTCCATGTGATTGACAGCCAGACCTCTGGGTTCCCACCAAATTATCTTCCAGTTGAATGACCATTTACTTGA 1644 

TACAAATTGTACCTTTCTGTTTTTCTAGTCAGGTTGGTCGCCTGCAGGGACGCGTACTTTGCCACCCGACCAGAGGTTC 1723 
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CTCGAAGATATTCCCAATCACTACTTTATTGCGTTAGCAGACTCACACATATAGAAACCAGCTCAAATTTAACCGAGAT 1802 

AAACCCTCCACTCCACCAAACCTACCCGTCCCrCTCTTTCCTCTATTCACTCATCTCATCAACCT 1881 

ATGTGTGACTAAAGTGCCCGGTTTTAGCCACAGACAACTGCnAGATGTCACCTCT^ 1960 

GCmAACCAWCATACGACCAGTGTCCAATTCaCArrCACTCCACACTATTATCTCAT^ 2039 

TGTTTTTAAAACTGGATTTGGGGCACATTCATTCACCCCAACACTTCT^ 21 18 

GTCACTAACACACTGATTCTCCTTAAAGTAATTCTCGAACTGT 2197 

TT6TCTCCTTCCCTGGCA T GCAGA TACCCAAGTTGCTTTTCCAACTTTCGCCTCCGCT AGGAGATCAGAAAGAATTCTT 2276 

GTGACTTCCTGGCCAGCCATTGAATTCATTTTCCATGAGAAGATGACAGAGTTAGCCTGTGGCTATAGGAGATCATGTC 2355 

ATCCAGACCTTTTTGCCCATCACATTAACTTTCCTGGAATATTGTGCT^ 2*34 

TGACAGCTCTTGTGTATACTGTGTTGAAGCCAGACAGAAAAGTMTGGGGCCACTTCT 25 13 

TCACAGCAGCTAAAGGGnGTGCCAAACAnTTATTAAGAAAGTAAAGCCCAGATTTGAA^ 2592 

TTATAGTATAGAG6CATTTGTAATATGGAGAAMTAATTT 2671 

GTGTTCTnGGCCCTTCAAATACTGGTGTTACATTGTTGCTGCAGATAAATGATGATTGTCGTGGGATATCTCGA 2750 

TGAGCTCTGTGCTTTCATTCCTAGAGATGTTTCTCATTCCCArrTAGTGAAATGCTGTTCCCCCAAAGTGATGGT^ 2829 

GGATTTCTTACCGGTCATAGGCCCCGGTGAGGAGCAGGGAAGCGCCATTGTGAAAGATTAAAGAAAGCACTTCCACTTG 2908 

AGCTCCTTATGGAGTGAGCTTCCCTGTGCCCACTCAGTGAACTAAGTCTGACCATCCTTCAGGGACGTTCCTTTTGGTA 2987 

AATATACACTGTAATCTTTAAGTCTAAATTTATATGTGAAAGTTAACTTTTTTTAAAAACCTAAATAAAATTATTTTCC 3066 

TATCAAAAAAAAAAAAAAAAA 3087 
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Input file T187Aymue064g11; <*tput File Tl87Ayinue064g11.pat 
Sequence length 2883 

CTCCAayUWUUCCTCCTTCCACTAGGCCCATCaXCCTCCCTCCT 79 

TcccArmAccAGara^cccQwwaw^ 158 

M G G A R 0 6 

CACCTGCCTGTGCCCCTGGCTCCTCGGCTCCCTGCAGCTCCGAGGCAGCAGC ATC GGT GGC GCC CGG GAC 228 

VGWVAACLVLGAGACYCIYR 26 
GTG GGC TGG GTG GCA GCA GCG CTG GTC CTG GGC GCC GGC GCC TGC TAC TGT ATC TAC CGG 288 

LTRGPRRCGRRtRPSRSAEO 46 
CTG ACT CGG GGA CCG CGG CGA GGC GGT CGC CGA CTG CGC CCT TCG CGA TCC GCA GAA GAC 348 

LTOGSYDD I LNAEOLJCICLLY 66 
CTA ACC GAT GGC TCC TAT GAC GAT ATC TTA AAT GCA GAG CAG CTT AAG AAA CTT CTG TAT 408 

LIESTDDPVITEKAIVTLGN 86 
CTG CTG GAG TCA ACC GAC GAT CCT GTC ATT ACT GAA AAG GCC TTG GTC ACC TTG GGA AAT 468 

MAAFSTNOA f IRELGGIP.IV 106 
AAT GCA GCC TTC TCC ACT AAC CAG GCC ATT ATT CGT GAG TTG GGT GGT ATC CCA ATT GTT 528 

GNJCI HSLHQStlCEICALNALN 126 
GGA AAC AAA ATC AAC TCC CTG AAC CAA ACT ATT AAA GAG AAA GCT TTA AAT GCA CTG AAT 588 

MLSVMVENQTK IJCI YVPOVC 146 
AAC aC ACT GTG AAT GTT GAA AAT CAA ACT AAG ATA AAG ATA TAC GTC CCT CAA GTC TGT 648 

E0VFADPLKSAVOLAGLRLL 166 
. GAG GAC GTC TTT GCT GAC CCC CTG AAC TCT GCG GTG CAG CTG GCC GGA CTG AGG CTG CTG 708 

TMHTVTMOYOHLLSGSVAGL 186 
ACA AAC ATG ACG GTC ACC AAC GAC TAT CAG CAC CTG CTC AGC GGC TCC GTC GCT GGC CTG 768 

FHLLLLGNGSTKVQVIKLLL 206 
TTC CAC CTG CTG CTG CTG GGA AAC GGA AGC ACC AAG GTC CAG GTT TTG AAG CTG CTT TTG 828 

M.LSENSAHTEGLLSVQVSRL 226 
AAT TTG TCT GAG AAT TCA GCC ATG ACA GAA GGA CTA CTG ACT GTC CAA GTA ACT AGA TTA 888 

PTRFISAHIORF* 239 
CCT ACC CGG TTC ATT AGT GCA CAC ATA CAG AGA TTT TGA 927 

CAAATAGATCTCCAAAGGTATGCCCAAAAACATTCACAGGAATTATTTCTGAAGATGAGTATTAAGCATATTTTGTTTT 1006 

TTAAAACTTCTCTGT6GCACCAGCAGACTTTCCATCTCTGGCCACTTTGCAGTATTTTTCTGTCACTGCATTTTAAAGT 1085 

TTGTTTTTTTTGTGCATGTGTACCTCAGCATTTGCTGAAACAACTGTACTGAGTGAGTCCCCTGTGTGGGCTCGGTCCT 1164 

GAGCATTCAGCCAGCACCAGCAAGTTCTTAGTGTTCCCATGGAACTTAGGAGAAGCAACCATGTAACAAATTAG 1243 

CTGTTGAAAACATGTAACAAACCATTGAAACAGTCCCTGTGCTCTGAAGAAGGCCAGGCGGTGTGAGCCGTCTGCAGAA 1322 

ATCGAGCCATCTGCTCCGTCCTGTTACCAGAACTGTGTGTAAGAGCTMTGCTGATTGAACTMTGTGTTCTTACAAAA 1401 

ACTGGATAGATCCTAAAGGGGTTGGTTTCCCAAATGGCTACACTCTGGACTTCCAAAGAAATCTTAGTTTTTCCCCTAA 1480 

CAAAACGTCATTTTCACTTGTAACATGGAATAAAAATGAAACATGTCCCTTACGCTTGCCTGGAGTCAGACTTTTACAG 1559 

rGTTAACTAATGGATGCTGTTTrAAAATAGGACAGTGACGCTGTTTCCTCTTTCAGGTGGATTCTTCATTCCTTTCCCT 1638 

f rATGACGGCCAAGTAGCAAATGAGATfCTTCTTCGGGCTCTTACACTGTTTCAGAATATAAACAACTGCCTCAAAGTG 1717 

GAAGCCCGGTTAGCTAATCAGATTCCTTTTGCTAAAGGGTCATTGTTTTTTCTGTTATACCGAGAAGAATGTGCCCAGA 1 796 

AAATGAGAGCTTTAGCCTGTCATCATGATGTGGATGTGAAAGAGAAAGCTTTAGCAATAAAGCCGAAATTCTGATCGGT 1875 

TGCTCCTATTTTTATCAAAGACTCAAACAGTAAGGCAGTCTTAAGTCAGCACACGGGAGCGTTTGCCTGCCTTTAAAAG 1954 

GGGTCTTTCAGCCGArGGAGTTAAACAATAAAAGTGAGTGAGCAGCTCTAATCCAACACGATGTTCAAAATTTTAGATT 2033 

TTGGAGTAGTTCAGATTTGGGGTTTGGGGATTGAGTAGAGTCTGGAACCTTCCGAGGATGTGGATCATTTACGGGGCAA 21 12 



r-16 55 I 
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ACGTTTGGTTATGATCGTGGACAGACTGGCCATGCTCTTCAGGACTATTTGAA 2191 

GAGCGGCTGTACTGAAGATACTTGCTGAGGTATTTAATGGTTTCCTGACACGAACTG 2270 

CTAACTCCTGGGAGCATTTGCAGTTGCTCATGAGACAGCGTTAAGTGCTGAGrT^ 2349 

ACmCTCCCTCAAACCACTCAATACTGCAACCTCCACTCCACCACCAACrc^ 2428 

ATCGTGAGACACTGCCTGCAGGATTTCTGATCAGTAGGACTGTACTCCCATTTACAT^ 2507 

ACCCCCTTGTGTAAGATACTGCAGAGCACT CCAAGCTTCCACCCACAGGCAGACAGCCCTTTAAAAAAGAGTGTCCTGA 2586 

TAAGTCCAGATGGATACATGGAGAAAGATACCCATGAGATGGCTGCTTTGAAAGCATGCTGGGAAGCAATGTAITAGGG 2665 

TCCCGTGTCTTTTTmCTaCAGTAATGATAAATACACTTATACATGGACAGAACAnTCT 2744 

TTCTGGGACTGGGACTAGGGTACATAGATTTCTTTGTGTTCCTGTTTCTAaGf I IGCATTTGTACTGACCATAAATTG 2823 

TATAATTTTTTAATAAAAAGGAAAAATGCAAGGTGTACATAAAAAAAAAAAAAAAAAAAA 2883 
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Input file T215Atn«21S; Output File T215AtnX215.pat 
Sequence length 2744 

MEIORVAQLGIV 12 
CTCGGTACCGACACAGCAACCCGAAACG ATG GAG CTA GAC AGA TGG GCG CAG TTG GGG CTG CTG 64 

cigtLLISSLPREYTVINEA 32 
TTC CTG CAG CTC CTT CTC ATC TCA TCG TTG CCA AGA GAG TAC ACG GTC ATT AAT GAA CCC 124 

CPCAEWMIM CRECCBTDOIE 52 
TCT CCC GGA GCT GAG TGG AAC ATC ATG TGT AGA GAA TGT TGT GAA TAT CAT CAG ATT GAA 184 

CtCPGICKEVVGYTIPCCRME 72 
TCC CTC TGC CCA GGA AAG AAG GAA GTG GTG GGT TAC ACC ATC CCA TGC TGC AGG AAT GAG 244 

DN EC0SCLIHPGCT1FEMCIC 92 
GAT AAT GAA TGT GAC TCC TGT CTA ATT CAC CCA GGT TGT ACC ATC TTT GAA AAC TGC AAG 304 

SCRMGSU6G TL00FYVICGFY 112 
AGC TGC CGC AAT GGC TCC TGG GGC GGA ACT CTG GAT GAC TTC TAC GTG AAG GGA TTC TAC 364 

CAECRAGUY. GGOCNRCGOVL 132 
TGC GCA GAG TCC AGG GCA GGC TGG TAC GGA GGA GAC TGC ATG CCA TGT GGC CA3 GTT CTT 424 

RASKGQILLESYPLNAHCEU 152 
CGA CCC TCA AAG GGT CAG ATC TTG TTG GAG ACC TAT CCC TTA AAC GCT CAC TCT GAA TGC 434 

TIHARPGF1 IQLRFGMLSLE 172 
ACT ATT CAT GCC AGA CCT GGG TTT ATC ATC CAG TTG AGG TTT GGT ATG CTG ACC CTA GAG 544 

FDYMCOYDYVEVRDGDNSOS 192 
TTT GAC TAC ATG TGC CAA TAT GAC TAT GTG GAG CTC CGC GAT GGG GAT AAT ACT GAC AGC 604 

PIIKRFCGNERPAPIRSTGS 212 
CCT ATC ATC AAG CGT TTC TGT GGC AAC GAG ACG CCA GCT CCC ATC AGG AGC ACT GGC TCT 664 

SLHVLFHSDGSKNFDGFHAV 232 
TCA CTC CAT CTC CTT TTC CAT TCT CAT GGC TCC AAG AAC TTC GAT GGC TTC CAC GCT GTC 724 

FEEITA CSSSPCFHOGTCLl 252 
TTT GAG GAG ATC ACA GCG TGC TCC TCA TCC CCT TGT TTC CAT GAT GGC ACA TCC CTC CTT 784 

0TTGSFICCACLA6YTGORCE 272 
GAC ACC ACT GCG TCT TTC AAG TGT GCC TCC CTG CCT CGC TAC ACT CCG CAG CGC TGT CAA 844 

MLLEERNCSDLGGPVNGYKJC 292 
AAT CTA CTT GAA GAA AGA AAC TGC TCA GAC CTT GGG GGG CCA GTC AAT GGG TAC AAG AAA 904 

irECPGlLMERHVICIGTVVS 312 
ATC ACA 5AA GGT CCT GGA CTT CTC AAT GAG CGC CAT GTA AAA ATT GGC ACC CTT CTG TCT 964 

FFC MGSYVLS6REICRTC00M 332 
TTC TTT TGT AAC GGC TCA TAC GTT CTG ACT GCC AAT GAG AAA CGA ACT TGC CAG CAG AAT 1024 

GEUSGKQPVCNKACREPKtS 352 
GGA GAG TGG TCA GGA AAG CAA CCT GTC TGC ATG AAA GCC TGC CGG GAA CCG AAG ATC TCA 1084 

OLV RRRVLSNQVQSRETPLH 372 
CAC CTG CTG AGA AGG AGA GTC CTT TCG ATG CAG GTT CAG TCA AGG GAG ACA CCA TTA CAT 1144 

OLYSTAFSKOKL OOASTICICP 392 
CAC CTT TAT TCC ACG CCT TTC ACC AAG CAG AAA TTC CAG CAT CCC TCT ACC AAA AAG CCA 1204 

AtPFGOlPPGYOHLMTOVOY 412 
CCC CTT CCA TTT CGA CAC CTG CCC CCT CGA TAC CAA CAT CTC CAC ACC CAA GTC CAG TAT 1264 

ECl 5PFTRRLGSSRRTCIRT 432 
CAG TGC ATC TCC CCC TTC TAC CGC CGC CTG GGA AGC AGC ACG AGG ACA TCC CTG ACA ACT 1324 

GKVS GRAPSCIPICGKIEST 452 
GCC AAG TGC ACT GGG CCC CCC CCC TCC TGT ATC CCA ATC TGT CGA AAA ATC CAG ACC ACT 1384 

PSPKTOCTRUPUQAAtYRRT 472 
CCT TCT CCA AAC ACC CAA CGC ACC CGC TCC CCA TCC CAC CCA CCC ATC TAC CCG ACC ACC H44 



fT(Cv 66 (\orZ) 
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SGVNDGGLHKGAVFIVCSGA 492 
AGT GGT GTA CAC GAT GGT GGT CTG CAC AAA GGT CCA TGG TTC TTG GTC TGC AGT GGT GCC 1504 

LVN EflTVVVAAHCVTELGKA 512 
CTG GTG AAT GAA CGG ACT GTG GTT GTG GCT GCC CAC TGT GTG ACT GAG CTG GGG AAG GCC 1564 

rilKTADLKVVLGKFTRODO 532 
ACC ATC ATC AAG ACA GCA GAC CTC AAG GTT GTC TTG GGA AAA TTC TAC AGG GAC GAT GAT 1624 

RDEKS10HLRVSAIILHPNY 552 
CGG GAT GAG AAG AGC ATC CAG AAT TTA CGG GTT TCT GCT ATC ATT CTG CAC CCC AAC TAT 1684 

DPI LLOTDIAVLKLLOKAR I 572 
CAC CCT ATC CTG CTT GAC ACT GAC ATC GCT GTT CTG AAG CTC CTA GAC AAA GCT CGC ATC 1744 

STRVOPICLATTRDLSTSFQ 592 
AGT ACC CGT GTC CAA CCC ATC TGC CTG GCT ACC ACT CGG GAC CTC AGC ACC TCT TTC CAG 1804 

E S H I TVAGUNILAOVRSPGF 612 
GAA TCC CAC ATC ACT CTG GCT GGC TGG AAC ATC CTG GCA GAT CTG AGG AGC CCT GCC TTT 1864 

KHDTLHYGMVRVVOPMLCEE 632 
AAG AAT GAT ACC TTA CAT TAT GGA ATG GTC AGA GTG GTA GAC CCA ATG CTT TGT GAG CAA 1924 

0 H E D H C I PVSVTDMMFCASK 652 
CAG CAT GAA GAC CAT GGC ATT CCA GTT AGT GTC ACT GAC AAC ATG TTC TGT GCC AGC AAA 1984 

DPS T PSD ICTAETGG IAALS 672 
GAT CCC AGT ACC CCT TCT GAC ATC TGC ACT GCA GAG ACA GGG GGC ATC CCT CCT TTG TCC 2044 

FPGRASPEPRWHLVGLVSWS 692 
TTC CCA GGC CGA GCA TCC CCC GAG CCA CGC TGG CAT TTG GTG GGG CTG GTC AGC TGC AGC 2104 

YOKTCSMGLSTAFTICVIPFIC 712 
TAT GAC AAG ACA TGT AGC AAT GGC CTA TCC ACA GCC TTC ACA AAG GTG TTG CCG TTC AAA 2164 

0 U I E R R H K • J21 
GAC TGG ATT GAG AGA AAC ATG AAA TGA '191 

ACCAGCCACAAGGCCACTGAGAAGCCTTTTCCTAGCATCCGTCTGTA^ 2270 

TGTAATnTGCCCACCATCTTGCCTACTGAMGGCTCCTGGTTTCAGGGACTTATCTCAATAGAGCGTGAACAGACTTT 2349 

ACTTCATCAWJGAACTGTCTCCCTGACTGCTTGGGAATCATCTAAAAGATGCCAGGTCTTGCAAC^ 2428 

AAAGAAGACCATGTGACTAGAAGGAGAACCTCTTGCTCCTGCTCCACTCAGAGTGAT 2507 

TGAGAAGGTTGATTTGGGGAGGCCTGCGCTGCACCTGGCnCTGTCAAAGTT 2586 

CAGGGCAAAGGAGATTG(%TGTGGCACCCTGTGTAAATTGTCACAAGATTGTCTGATCCTTTCCCTTTCC^ATCTTCTG 2665 

TACACATTTCAATAAAACAAGGTCTGCTCCCTGACCTACCAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2744 

TACACATTTCAATAAAACAAGGTCTCCTCCCTGACCTACCAAACAAA A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2744 



